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Preface 


The best data is the data that we can see and understand. As developers and data scientists, 
we want to create and build the most comprehensive and understandabie visuaiizations. 

It is not aiways simpie; we need to find the data, read it, ciean it, fiiter it, and then use the 
right tooi to visuaiize it. This book expiains the process of how to read, ciean, and visuaiize 
the data into information with straight and simpie (and sometimes not so simpie) recipes. 

Howto read iocai data, remote data, CSV, JSON, and data from reiationai databases are aii 
expiained in this book. 

Some simpie piots can be piotted with one simpie iine in Python using matpiotiib, but 
performing more advanced charting requires knowiedge of more than just Python. We need 
to understand information theory and human perception aestheticsto produce the most 
appeaiing visuaiizations. 

This book wiii expiain some practices behind piotting with matpiotiib in Python, statistics used, 
and usage exampies for different charting features that we shouid use in an optimai way. 


What this book covers 


Chapter 1, Preparing Your Working Environment, covers a set of instaiiation recipes and advice 
on howto instaii the required Python packages and iibraries on your piatform. 

Chapter 2, Knowing Your Data, introduces you to common data formats and how to read and 
write them, be it CSV, JSON, XSL, or reiationai databases. 

Chapter 3, Drawing Your First Piots and Customizing Them, starts with drawing simpie piots 
and covers some customization. 

Chapter 4, More Piots and Customizations, foiiows up from the previous chapter and covers 
more advanced charts and grid customization. 

Chapter 5, Making 3D Visuaiizations, covers three-dimensionai data visuaiizations such as 
3D bars, 3D histograms, and aiso matpiotiib animations. 
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Chapter 6, Plotting Charts with Images and Maps, deais with image Processing, projecting 
data onto maps, and creating CAPTCHA test images. 

Chapter 7, Using Right Plots to Understand Data, covers expianations and recipes on some 
more advanced piottingtechniques such as spectrograms and correiations. 

Chapter 8, More on matplotiib Gems, covers a set of charts such as Gantt charts, box piots, 
and whisker piots, and itaiso expiains howto use LaTeXfor renderingtext in matpiotiib. 

Chapter 9, Visualizations on the Clouds with Plot.ly, introduces howto use Piot.iyto create 
and share your visuaiizations on its cioud environment. 


What you need for this book 


For this book, you wiii need Python 2.7.3 or a iater version instaiied on your operating system. 

Another Software package used in this book is IPython, which is an interactive Python 
environment that is very powerfui and fiexibie. This can be instaiied using package 
managers for Linux-based OSes or prepared instaiiers for Windows and Mac OS X. 

If you are newto Python instaiiation and Software instaiiation in generai, it is highiy 
recommended to use prepackaged scientific Python distributione such as Anaconda, 
Enthought Python Distribution or Python(x, y). 

Other required Software mainiy comprises Python packages that are aii instaiied using the 
Python instaiiation manager, pip, which itseif is instaiied using Python's easyjnstaii setup tooi. 


Who this book is for 


Python Data Visualization Cookbook, Second Edition is for deveiopers and data scientists who 
aiready use Python and wantto iearn howto create visuaiizations of their data in a practicai 
way. If you have heard about data visuaiization but don't know where to start, this book wiii 
guide you from the start and heip you understand data, data formats, data visuaiization, and 
how to use Python to visuaiize data. 

You wiii need to know some generai programming concepts, and any kind of programming 
experience wiii be heipfui. However, the code in this book is expiained aimost iine by iine. 

You don't need math for this book; every concept that is introduced is thoroughiy expiained 
in piain Engiish, and references are avaiiabie for further interest in the topic. 
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Sections 


In this book, you will find several headings that appear frequently (Getting ready, How to do it, 
How it Works, There's more, and See aiso). 

To give ciear instructions on how to compiete a recipe, we use these sections as foiiows: 


Getting ready 


This section teiis you what to expect in the recipe, and describes how to set up any Software or 
any preiiminary settings required for the recipe. 


How to do it... 


This section contains the steps required to foiiow the recipe. 


How it Works... 


This section usuaiiy consists of a detaiied expianation of what happened in the previous section. 


There's more... 


This section consists of additionai information about the recipe in order to make the reader 
more knowiedgeabie about the recipe. 


See aiso 


This section provides heipfui iinks to other usefui information for the recipe. 


Conventions 


In this book, you wiii find a number of styies of text that distinguish between different kinds of 
information. Here are some exampies of these styies and an expianation of their meaning. 

Code words in text, database tabie names, foider names, fiienames, fiie extensions, pathnames, 
dummy URLs, user input, and Twitter handies are shown as foiiows: "We packed our iittie demo 
in the DemoPiL ciass, so that we can extend it easiiy, whiie sharingthe common code around 
the demo function, run_f ixed_f ilters_demo." 
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A block of code is set as follows: 

def my_function(x): 
return x*x 

When we wish to draw your attentiori to a particuiar part of a code biock, the reievant iines or 
items are set in boid: 

for a in range(lO): 

print a 

Any command-iine input or output is written as foiiows: 

$ sudo python setup.py install 



Warnings or important notes appear in a box like this. 


Tips and tricks appear like this. 


] 
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Reader feedback 


Feedback from our readers is always welcome. Let us know what you think about this 
book—whatyou liked or may have disliked. Reader feedback is important for usto develop 
tities that you really get the most out of. 

To send us general feedback, simply send an e-maii to f eedbackspacktpub. com, and 
mention the book titie via the subject of your message. 

If there is a topic that you have expertise in and you are interested in either writing or 
contributingto a book, see our author guide on www.packtpub.com/authors. 


Customer support 


Now that you are the proud owner of a Packt book, we have a number of things to help you to 
get the most from your purchase. 
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Downloading the example code 


You can download the example code files for all Packt books you have purchased from your 
account at http: //www. packtpub. com. If you purchased this book eisewhere, you can 
visit http://www.packtpub.com/support and registerto have the files e-mailed directiy 
to you. 


Downloading the color images of this book 


We also provide you with a PDF file that has color images of the screenshots/diagrams used 
in this book. The color images will help you better understand the changes in the output. 

You can download this file from: http: / /www. packtpub. com/sites/def ault/ files/ 
downloads/PythonDataVisualizationCookbookSecondEdition_ColoredImages 
pdf. 


Errata 


Although we have taken every care to ensure the accuracy of our content, mistakes do happen. 
If you find a mistake in one of our books—maybe a mistake in the text or the code—we wouid be 
gratefui if you wouid report this to us. By doing so, you can save other readers from frustration 
and help us improve subsequent versions of this book. If you find any errata, please report them 
by visiting http: //www. packtpub. com/submit-errata, selecting your book, clicking on 
the errata submission form link, and enteringthe detaiis of your errata. Once your errata are 
verified, your submission will be accepted and the errata will be upioaded on our website, or 
added to any listofexisting errata, under the Errata section of thattitie. Any existing errata can 
be viewed by selecting your titie from http://www.packtpub.com/support. 


Piracy 


Piracy of Copyright materiai on the Internet is an ongoing problem across all media. At Packt, 
we take the protection of our Copyright and licenses very seriously. If you come across any 
illegal copies of our works, in any form, on the Internet, please provide us with the location 
address or website name immediately so that we can pursue a remedy. 

Please contact us at copyright@packtpub. com with a link to the suspected pirated materiai. 

We appreciate your help in protecting our authors, and our ability to bring you valuable content. 


Questions 


You can contact us at questions@packtpub. com if you are having a problem with any 
aspect of the book, and we will do our bestto address it. 
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1 

Preparing Your 
Working Environment 


In this chapter, you will cover the following recipes: 

► Installing matplotiib, NumPy, and SciPy 

► Installing virtualenv and virtualenvwrapper 

► Installing matplotiib on Mac OS X 

► Installing matplotiib on Windows 

► Installing Python Imaging Library (PIL) for image Processing 

► Installing a requests module 

► Customizing matplotlib's parameters in code 

► Customizing matplotlib's parameters per project 


Introduction 


This chapter introduces the reader to the essentiai tooling and their installation and 
configuration. This is necessary work and a common base for the rest of the book. If you have 
never used Python for data and image Processing and visualization, it is advised not to skip 
this chapter. Even if you do skip it, you can always return to this chapter in case you need to 
install some supportingtools or verify what version you need to supportthe current solution. 






Preparing Your Working Environment 


Installing matplotiib, NumPy, and SciPy 


This chapter describes several ways of installing matplotiib and required dependencies 
under Linux. 


Getting ready 


We assume that you aiready have Linux (preferably Debian/Ubuntu or RedHat/SciLinux) 
installed and Python installed on it. Usually, Python is aiready installed on the mentioned 
Linux distributions and, if not, it is easily installable through Standard means. We assume 
that Python 2.7+ Version is installed on your workstation. 



Almost ali code shouid work with Python 3.3+ Versions, but since most 
operating Systems stili deliver Python 2.7 (some even Python 2.6), 
we decided to write the Python 2.7 Version code. The differences are 
small, mainiy in the version of packages and some code (xrange 
shouid be substituted with range in Python 3.3+). 


We also assume that you know how to use your OS package manager in order to install 
Software packages and know how to use a terminal. 

The build requirements must be satisfied before matplotiib can be built. 

matplotiib requires NumPy, libpng, and freetype as build dependencies. In order to be 
able to build matplotiib from source, we must have installed NumPy. Here's how to do it: 

Install NumPy (1.5+ if you want to use it with Python 3) from http: //www. numpy. org/ 

NumPy will provide us with data structures and mathematical functions for using it with large 
datasets. Python's default data structures such as tuples, lists, or dictionaries are great 
for insertions, deletions, and concatenation. NumPy's data structures support "vectorized" 
operations and are very efficient for use and for executions. They are implemented with big 
data in mind and rely on C implementations that allow efficient execution time. 



SciPy, building on top of NumPy, is the de facto standard's scientific and 
numeric toolkit for Python comprising a great selection of speciai functions 
and algorithms, most of them actually implemented in C and Fortran, coming 
from the well-known Netiib repository (http: //www.netlib.org). 


Perform the following steps for installing NumPy: 


1. Install the Python-NumPy package: 

sudo apt-get install python-numpy 
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2 . Check the installed version: 

$ python -c ' import niompy; print numpy._version_' 

3. Install the required libraries: 

□ libpng 1.2: PNG files support (requires zlib) 

□ freetype 1.4+: True type font support 

$ sudo apt-get build-dep python-matplotlib 

If you are using RedHat or a variatiori of this distributiori (Fedora, SciLinux, or CentOS), 
you can use yum to perform the same installation: 

$ su -c 'yum-builddep python-matplotlib' 


How to do it... 


There are many ways one can install matplotiib and its dependencies: from source, 
precompiled binaries, OS package manager, and with prepackaged Python distributions 
with built-in matplotiib. 

Most probably the easiest way is to use your distribution's package manager. For Ubuntu 
thatshouid be: 

# in your terminal, type: 

$ sudo apt-get install python-numpy python-matplotlib python-scipy 

If you want to be on the bleeding edge, the best option is to install from source. This path 
comprisesa few steps: getthe source code, build requirements, and configure, compile, 
and install. 

Download the latest source from code host SourceForge by following these steps: 

$ cd -/Dovmloads/ 

$ wget https://downloads. sourceforge.net/proj ect/matplotlib/matplotlib/ 
matplotiib-1.3.1/matplotlib-l.3.1.tar.gz 
$ tar xzf matplotiib-1.4.3.tar.gz 
$ cd matplotlib-1.4.3 
$ python setup.py build 
$ sudo python setup.py install 



Downloading the example code 

You can download the example code files for ali the Packt books you have 
purchased from your account at http: //www.packtpub. com. If you 
purchased this book eisewhere, you can visit http: //www. packtpub. 
com/support and register to have the files e-mailed directiy to you. 
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How it Works... 


We use Standard Python Distribution Utilities, known as Distutiis, to install matplotiib from 
the source code. This procedure requires us to previously install dependencies, as we aiready 
explained in the Getting ready section of this recipe. The dependencies are installed usingthe 
Standard Linux packagingtools. 


There's more... 


There are more optional packages that you might want to install depending on what your data 
visualization projects are about. 

No matter what project you are working on, we recommend installing IPython— an Interactive 
Python Shell where you aiready have matplotiib and related packages, such as NumPy and 
SciPy, imported and ready to play with. Please refer to IPython's officiai site on how to install it 
and use it—it is, though, very straightforward. 


Installing virtualenv and virtualenvwrapper 


If you are working on many projects simultaneously, or even Just switching between them 
frequently, youll find that having everything installed system-wide is not the best option and 
can bring problems in future on different Systems (production) where you wantto run your 
Software. This is not a good time to find out that you are missing a certain package or youYe 
having versioning conflicts between packages that are aiready installed on production system; 
hence, virtualenv. 

virtualenv is an open source project started by lan Bicking that enables a developer to isolate 
working environments per project, for easier maintenance of different package versions. 

For example, you inherited legacy DJango website based on DJango 1.1 and Python 2.3, but 
at the same time you are working on a new project that must be written in Python 2.6. This 
is my usual case—having more than one required Python version (and related packages)— 
depending on the project I am working on. 

virtualenv enables me to easily switch between different environments and have the same 
package easily reproduced if I need to switch to another machine or to deploy Software to a 
production server (or to a clienfs workstation). 
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Getting ready 


To install virtualenv, you must have a workable installation of Python and pip. Pip is a tool 
for installingand managing Python packages, and it is a repiacementfor easy_install. 
We wiii use pip through most of this book for package management. Pip is easiiy instaiied, 
as root executes the foiiowing iine in your terminai: 

# easyinstall pip 

virtuaienv by itseif is reaiiy usefui, but with the heip of virtualenvwrapper, aii this becomes 
easy to do and aiso easy to organize many virtuai environments. See aii the features at 
http://virtualenvwrapper.readthedocs.org/en/latest/#features. 


How to do it... 


By performing the foiiowing steps, you can instaii the virtuaienv and virtuaienvwrapper toois: 

1. Instaii virtuaienv and virtuaienvwrapper: 

$ sudo pip install virtuaienv 

$ sudo pip install virtualenvwrapper 

# Create folder to hold all our virtuai environments and export 
the path to it. 

$ export VIRTENV=-/.virtualenvs 
$ mkdir -p $VIRTENV 

# We source (ie. execute) Shell script to activate the wrappers 
$ source /usr/local/bin/virtualenvwrapper.sh 

# And create our first virtuai environment 
$ mkvirtualenv virtl 

2. You can now instaii our favorite package inside virti: 

(virtl)userl:-$ pip install matplotlib 

3. You wiii probabiy want to add the foiiowing iine to your ~/ . bashrc fiie: 
source /usr/loca/bin/virtualenvwrapper.sh 

A few usefui and most frequentiy used commands are as foiiows: 

► mkvirtualenv ENV: This creates a virtuai environment with the name env 
and acti vates it 

► workon ENV: This activates the previousiy created env 

► deactivate: This gets us out of the current virtuai environment 


Olh 
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pip not oniy provides you with a practical way of installing packages, but it aiso is a good 
soiution for keeping track of the python packages instaiied on your system, as weii as their 
version. The command pip f reeze wiii print aii the instaiied packages on your current 
environment, foiiowed by their version number: 

$ pip freeze 
matplotlib= = l.4.3 
mock==l.0.1 
nose==l.3.6 
numpy==l.9.2 
pyparsing==2.0.3 
python-dateutil==2.4.2 
pytz==2015.2 
six==l.9.0 
wsgiref==0.1.2 

In this case, we see that even though we simpiy instaiied matpiotiib, many other packages 
are aiso instaiied. Apartfrom wsgiref, which is used by pip itseif, these are required 
dependencies of matpiotiib which have been automaticaiiy instaiied. 

When transferring a project from an environment (possibiy a virtuai environment) to another, 
the receiving environment needs to have aii the necessary packages instaiied (in the same 
version as in the originai environment) in order to be sure that the code can be properiy run. 
This can be probiematic as two different environments might not contain the same packages, 
and, worse, might contain different versions of the same package. This can iead to confiicts 
or unexpected behaviors in the execution of the program. 

In order to avoid this probiem, pip freeze can be used to save a copy of the current 
environment configuration. The command wiii save the output of the command to the fiie 
requirements.txt: 

$ pip freeze > requirements.txt 

In a new environment, this fiie can be used to instaii aii the required iibraries. Simpiy run: 

$ pip install -r requirements.txt 

Aii the necessary packages wiii automaticaiiy be instaiied in their specified version. That way, 
we ensure that the environment where the code is used is aiways the same. This is a good 
practice to have a virtuai environment and a requirements . txt fiie for every project you 
are deveioping. Therefore, before instaiiingthe required packages, it is advised that you first 
create a new virtuai environment to avoid confiicts with other projects. 
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The overall workflow from one machine to another is therefore: 

► On machine 1: 

$ mkvirtualenv envl 

(envl)$ pip install matplotlib 

(envl)$ pip freeze > requirements.txt 

► On machine 2: 

$ mkvirtualenv env2 

(env2)$ pip install -r requirements.txt 


Installing matplotlib on Mac OS X 


The easiest way to get matpiotiib on the Mac OS X is to use prepackaged python distributione 
such as Enthought Python Distribution (EPD). Just go to the EPD site, and downioad and 
instaii the iatest stabie version for your OS. 

In case you are not satisfied with EPD or cannot use it for other reasons such as the versions 
distributed with it, there is a manuai (read: harder) way of instaiiing Python, matpiotiib, and its 
dependencies. 


Getting ready 


We wiii use the Homebrew (you couid aiso use MacPorts in the same way) project that eases 
the instaiiation of aii Software that Appie did not instaii on your OS, inciuding Python and 
matpiotiib. Under the hood, Homebrew is a set of Ruby and Git that automate downioad and 
instaiiation. Foiiowingthese instructione shouid get the instaiiation working. First, we wiii 
instaii Homebrew, and then Python, foiiowed by tooissuch as virtuaienv, then dependencies 
for matpiotiib (NumPy and SciPy), and finaiiy matpiotiib. Hoid on, here we go. 


How to do it... 


1. In your terminal, paste and execute the following command: 

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/ 
install/master/install)" 

After the command finishes, try running brew update or brew doctor to verify that the 
instaiiation is working properiy. 
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2. Next, add the Homebrew directory to your system path, so the packages you install 
using Homebrew have greater priority than other versions. Open ~/ .bash_prof ile 
(or /Users/ [your-user-name] / . bash_prof ile) and add the following line to 
the end of file: 

export PATH=/usr/local/bin:$PATH 

3. You will need to restart the terminal so that it picks a new path. Installing Python is as 
easy as firing up another one liner: 

brew install python --framework --universal 

This will also install any prerequisites required by Python. 

4. Now, you need to update your path (add to the same line): 

export PATH=/usr/local/share/python:/usr/local/bin:$PATH 

5. To verify that the installation has worked, type python - -version in the command 
line, you shouid see 2.7.3 as the version number in the response. 

6. You shouid have pip installed by now. In case it is not installed, use easy_install 
to add pip: 

$ easy_install pip 

7. Now, it's easy to install any required package; for example, virtualenv and 
virtualenvwrapper are usefui: 

pip install virtualenv 

pip install virtualenvwrapper 

8. The next step is what we really wanted to do ali along—install matplotiib: 

pip install numpy 
brew install gfortran 
pip install scipy 

9. Verify that everything is working. Call Python and execute the following commands: 
import numpy 

print numpy._version_ 

import scipy 

print scipy._version_ 

quit 0 

10. Install matplotiib: 

pip install matplotiib 
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Installing matplotiib on Windows 


In this recipe, we will demonstrate how to install Python and start working with matplotiib 
installation. We assume Python was not previously installed. 


Getting ready 


There are two ways of installing matplotiib on Windows. The easiest way is by installing 
prepackaged Python environments, such as EPD, Anaconda, SageMath, and Python(x,y). 
This is the suggested way to install Python, especially for beginners. 

The second way is to install everything using binaries of precompiled matplotiib and required 
dependencies. This is more difficult as you have to be carefui about the versions of NumPy 
and SciPy you are installing, as not every version is compatible with the latest version of 
matplotiib binaries. The advantage in this is that you can even compile your particular 
versions of matplotiib or any library to have the latest features, even if they are not provided 
by authors. 


How to do it... 


The suggested way of installing free or commerciai Python scientific distributions is as easy as 
followingthe steps provided on the projecfs website. 

If you just want to start using matplotiib and don't want to be bothered with Python versions 
and dependencies, you may wantto consider using the Enthought Python Distribution (EPD). 
EPD contains prepackaged libraries required to work with matplotiib and ali the required 
dependencies (SciPy, NumPy, IPython, and more). 

As usual, we download Windows installer (* . exe) that will install ali the code we need to start 
using matplotiib and ali recipes from this book. 

There is also a free scientific project Python(x,y) (http: //python-xy.github. io) for 
Windows 32-bit system that contains ali dependencies resolved, and is an easy (and free!) 
way of installing matplotiib on Windows. Since Python(x,y) is compatible with Python modules 
installers, it can be easily extended with other Python libraries. No Python installation shouid 
be present on the system before installing Python(x,y). 
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Let me shortiy explain how we wouid install matplotiib using precompiled Python, NumPy, 
SciPy, and matplotiib binaries: 

1. First, we download and install Standard Python using the officiai .msi installerfor 
our platform (x86 or x86-64). 

2. After that, download officiai binaries for NumPy and SciPy and install them first. 

3. When you are sure that NumPy and SciPy are properly installed. Then, we download 
the lateststable release binary for matplotiib and install it by followingthe officiai 
instructions. 


There's more... 


Note that many examples are not included in the Windows installer. If you want to try the 
demos, download the matplotiib source and look in the examples subdirectory. 


Installing Python Imaging Library (PIL) for 
image Processing 


Python Imaging Library (PIL) enables image Processing using Python. It has an extensive file 
format support and is powerfui enough for image Processing. 

Some popular features of PIL are fast access to data, point operations, filtering, image resizing, 
rotation, and arbitrary affine transforms. For example, the histogram method allows us to get 
statistics aboutthe images. 

PIL can also be used for other purposes, such as batch Processing, image archiving, creating 
thumbnaiis, conversion between image formats, and printing images. 

PIL reads a large number of formats, while write support is (intentionally) restricted to the 
most commonly used interchange and presentation formats. 


How to do it... 


The easiest and most recommended way is to use your platform's package managers. For 
Debian and Ubuntu use the following commands: 

$ sudo apt-get build-dep python-imaging 

$ sudo pip install http://effbot.Org/downloads/lmaging-l.l.7.tar.gz 
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How it Works... 


This way we are satisfying all build dependencies using the apt-get system but also installing 
the latest stable release of PIL Some older versions of Ubuntu usually don't provide the 
latest releases. 

On RedHat and SciLinux systems, run the following commands: 

# yum install python-imaging 

# yum install freetype-devel 

# pip install PIL 


There's more... 


There is a good oniine handbook, specifically, for PIL You can read it at http: //www. 
pythonware. com/library/pil/handbook/index. htm or download the PDF version 
from http://www.pythonware.com/media/data/pil-handbook.pdf. 

There is also a PIL fork, Pillow, whose main aim is to fix installation issues. Pillow can be found 
at http: //pypi . python. org/pypi/Pillow and it is easy to install (at the time of writing, 
Pillow is the oniy choice if you are using OS X). 

On Windows, PIL can also be installed using a binary installation file. Install PIL in your Python 
site-packages by executing . exe from http: //www. pythonware. com/products/pil/. 

Now, if you want PIL used in a Virtual environment, manually copy the PiL.pthfile and the 
PIL directory at C : \Python27\Lib\site-packages to your virtualenv site-packages 
directory. 


Installing a requests module 


Most of the data that we need now is available over HTTP or similar protocoi, so we need 
somethingto get it. Python library requests make the job easy. 

Even though Python comes with the urllib 2 module for work with remote resources and 
supporting HTTP capabilities, it requires a lot of work to get the basic tasks done. 

A requests module brings a new API that makes the use of web Services seamless and pain 
free. Lots of the HTTP 1.1 stuff is hidden away and exposed oniy if you need itto behave 
differently than default. 
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How to do it... 


Using pip is the best way to install requests. Use the following command for the same: 

$ pip install requests 

Thafs it. This can aiso be done inside your virtuaienv, if you don't need requests for every 
project or want to support different requests versions for each project. 

Just to get you ahead quickiy, here's a smaii exampie on how to use requests: 

import requests 

r = requests.get('http://github.com/timeline.json') 
print r.content 


How it Works... 


We sentthe get http requestto a URI at www.github. comthat returns a JSON-formatted 
timeiine of activity on GitHub (you can see HTML version of that timeiine at https : //github. 
com/timeline). Afterthe response is successfuiiy read, the r object contains contentand 
other properties of the response (response code, cookies set, header metadata, and even the 
request we sent in order to get this response). 


Customizing matplotlib's parameters in code 


The iibrary we wiii use the mostthroughoutthis book is matpiotiib; it provides the piotting 
capabiiities. Defauit vaiues for most properties are aiready set inside the configuration fiie 
for matpiotiib, caiied . rc fiie. This recipe describes how to modify matpiotiib properties from 
our appiication code. 


Getting ready 


As we aiready said, matpiotiib configuration is read from a configuration fiie. This fiie provides 
a piace to set up permanent defauit vaiues for certain matpiotiib properties, weii, for aimost 
everything in matpiotiib. 


How to do it... 


There are two ways to change parameters during code execution: using the dictionary of 
parameters (rcParams) or caiiing the matpiotiib. rc () command. The former enabies 
us to ioad an aiready existing dictionary into rcParams, whiie the iatter enabies a caii to a 
function using a tupie of keyword arguments. 








If we want to restore the dynamically changed parameters, we can use 
matplotlib. rcdefaults () call to restore the Standard matplotiib settings. 

The followingtwo code samples illustrate previously explained behaviors: 
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► An example for matplotlib. rcParams: 

import matplotlib as mpl 

mpl.rcParams['lines.linewidth'] = 2 

mpl.rcParams['lines.color'] = 'r' 

► An example for the matplotlib. rc () call: 

import matplotlib as mpl 

mpl.rc('lines', linewidth=2, color=‘r') 

Both examples are semantically the same. In the second sample, we define that all 
subsequent plots will have lines with line width of 2 points. The last statement of the 
previous code defines that the color of every line following this statement will be red, 
uniess we override it by local settings. See the following example: 

import matplotlib.pyplot as plt 
import numpy as np 

t = np.arange(0.0, 1.0, 0.01) 

s = np.sin(2 * np.pi * t) 

# make line red 

plt.rcParams['lines.color' ] = 'r' 
plt.plot(t,s) 

c = np.cos(2 * np.pi * t) 

# make line thick 

plt.rcParams['lines.linewidth'] = '3' 
plt.plot(t,c) 

plt.Show() 
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How it Works... 


First, we import matplotlib .pyplot and NumPy to allow us to draw sine and cosine 
graphs. Before piottingthe first graph, we expiicitiy set the iine eoior to red usingthe 
plt. reParams ['lines . color' ] = 'r' command. 

Next, we go to the second graph (cosine function) and expiicitiy set the iine width to three 
points usingthe plt. reParams ['lines . linewidth'] = '3' command. 

If we wantto reset specific settings, we shouid caii matplotlib.redefaults (). 

In this recipe, we have seen how to customize the styie of a matpiotiib chart dynamicaiiy 
changing its configuration parameters. The matplotlib. reParams object is the interface 
that we used to modify the parameters. It's giobai to the matpiotiib packages and any change 
that we appiy to it affects aii the charts that we draw after. 


Customizing matplotlib's parameters per 
project 


This recipe expiains where the various configuration fiies are that matpiotiib uses and why we 
want to use one or the other. Aiso, we expiain what is in these configuration fiies. 


Getting ready 


If you don't want to configure matplotlib as the first step in your code every time you use 
it (as we did in the previous recipe), this recipe will expiain how to have different default 
configurations of matplotlib for different projects. This way your code will not be cluttered 
with configuration data and, moreover, you can easilyshare configuration templates with 
your co-workers or even among other projects. 


How to do it... 


If you have a working project that always uses the same settings for certain parameters 
in matplotlib, you probably don't want to set them every time you want to add a new graph 
code. Instead, what you want is a permanent file, outside of your code, which sets defaults 
for matplotlib parameters. 

matplotlib supports this via its matplotlibrc configuration file that contains most of the 
changeable properties of matplotlib. 
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How it Works... 


There are three different places where this file can reside and its location defines its usage. 

They are: 

► Current working directory: This is where your cede runs from. This is the place to 
customize matplotiib justfor your current directory that might contain your current 
project code. The file is named matplotlibrc. 

► Per User .matplotiib/matplotiibrc: This is usually in the user's $home directory 
(under Windows, this is your Documents and Settings directory). You can find 
out where your configuration directory is usingthe matplotlib.get_conf igdir () 
command. Check the next command. 

► Per installation configuration file: This is usually in your Python site-packages. 

This is a system-wide configuration, but it will get overwritten every time you reinstall 
matplotiib; so, it is better to use a per user configuration file for more persistent 
customizations. The best usage so far for me was to use this as a default template, 
if I mess up my user's configuration file or if I need fresh configuration to customize 
for a different project. 

The following one liner will printthe location of your configuration directory and can be run 

from Shell: 

$ python -c 'import matplotiib as mpl; print mpl.get_configdir()' 

The configuration file contains settings for: 

► axes: This deais with face and edge color, tick sizes, and grid display. 

► backend: This sets the target output: TkAgg and GTKAgg. 

► figure: This deais with dpi, edge color, figure size, and subpiot settings. 

► font: This looks at font families, font size, and style settings. 

► grid: This deais with grid color and line settings. 

► legend: This specifies how legends and text inside will be displayed. 

► lines: This checks for line (color, style, width, and so on) and markers settings. 

► patch: These patches are graphical objects that fili 2D space, such as polygons 
and circles; set linewidth, color, antialiasing, and so on. 

► savefig: There are separate settings for saved figures. For example, to make 
rendered files with a white background. 

► text: This looks for text color, how to interpret text (plain versus latex markup) 
and similar. 
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► verbose: This checks how much information matplotiib gives during runtime: silent, 
helpfui, debug, and debugannoying. 

► xticks and yticks: These set the color, size, direction, and label size for major and 
minor ticks for the x and y axes. 


There's more... 


If you are interested in more detaiis for every mentioned setting (and some that we did not 
mention here), the best place to go is the website of the matplotiib project where there is 
up-to-date API documentation. If it doesn't help, user and development lists are always 
good places to leave questions. See the back of this book for usefui oniine resources. 
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Knowing Your Data 


In this chapter, well coverthe followingtopics: 

► Importing data from CSV 

► Importing data from Microsoft Excei fiies 

► Importing data from fixed-width data fiies 

► Importing data from tab-deiimited fiies 

► Importing data from a JSON resource 

► Exporting data to JSON, CSV, and Excei 

► Importing and manipuiating data with Pandas 

► Importing data from a database 

► Cieaning up data from outiiers 

► Readingfiies in chunks 

► Reading streaming data sources 

► Importing image data into NumPy arrays 

► Generating controiied random datasets 

► Smoothingthe noise in reai-worid data 


Introduction 


This chapter covers basies about importing and exporting data from various formats. 

We first introduce how to import data byjust using oniy the capabiiities of the Python 
Standard iibrary; then we introduce the powerfui Pandas iibrary which is becomingthe 
de facto Standard in data manipuiation in Python. Aiso weVe covered the ways of cieaning 
data such as normaiizing vaiues, adding missing data, iive data inspection, and usage of 
some simiiar tricks to get data correctiy prepared for visuaiization. 
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Importing data from CSV 


In this recipe, well work with the most common file format that you will encounter in the wild 
World of data— CSV. It stands for Comma Separated Values, which almost explains ali the 
formatting there is. (There is also a header part of the file, but those values are also comma 
separated.) 

Python has a module called csv that supports reading and writing CSV files in various 
dialects. Dialects are important because there is no Standard CSV, and different applications 
implement CSV in slightiy different ways. A file's dialect is almost always recognizable by the 
first look into the file. 


Getting ready 


What we need for this recipe is the CSV file itseif. Well use sample CSV data that you can 
download from ch02 -data. csv. 

We assume that sample data files are in the same folder as the code reading them. 


How to do it... 


The following code example demonstrates how to import data from a CSV file. We will perform 
the following steps for this: 

1. Open the ch02 -data. csv file for reading. 

2. Read the header first. 

3. Read the rest of the rows. 

4. In case there is an error, raise an exception. 

5. After reading everything, print the header and the rest of the rows. 

This is shown in the following code: 
import csv 

filename = ' ch02-data.csv' 

data = [] 
try; 

with open(filename) as f: 
reader = csv.reader(f) 
header = reader.next() 
data = [row for row in reader] 
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except csv.Error as e; 

print "Error reading CSV file at line %s: %s" % (reader.line_num, 
e) 

sys.exit(-1) 
if header; 

print header 

print '==================' 


for datarow in data; 
print datarow 


How it Works... 


First, we import the csv module in order to enable access to the required methods. Then, 
we open the file with data using the with compound statement and bind it to the object f . 

The context manager with statement releases us of care about the closing resource after we 
are finished manipulatingthose resources. It is a very handy way of working with resource-like 
files because it makes sure that the resource is freed (for example, that the file is closed) after 
the block of code is executed over it. 

Then, we use the csv. reader () method that returns the reader object, which allows us 
to iterate over ali rows of the read file. Every row isjust a list of values and is printed inside 
the loop. 

Reading the first row is somewhat different as it is the header of the file and describes the 
data in each column. This is not mandatory for CSV files and some files don't have headers, 
but they are a really nice way of providing minimal metadata about datasets. Sometimes 
though, you will find separate text or even CSV files that are Just used as metadata describing 
the format and additional data about the data. 

The oniy way to check what the first line looks like is to open the file and visually inspect it 
(for example, see the first few lines of the file)... This can be done efficiently on Linux using 
bash commands like head as shown here: 

$ head somefile.csv 

During iteration of data, we save the first row in header while we add every other row to the 
data list. 

We can also check if the . csv file has a header or not using the method csv. has_header. 

Shouid any errors occur during reading, csv. reader () will generate an error that we can 
catch and then print the helpfui message to the user in order to help detection of errors. 
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There's more... 


If you want to read about the background and reasoning for the csv module, the PEP-defined 
document CSV File API is available at http: //www. python. org/dev/peps/pep-03 05/. 

If you have larger files that you want to load, it's often better to use well-known libraries like 
NumPy's loadtxt () that cope better with large CSV files. 

The basic usage is simple as shown in the following code snippet: 

import numpy 

data = numpy.loadtxt('ch02-data.csv' , dtype='string', delimiter=', ') 

Note that we need to define a delimiter to instruet NumPy to separate our data as appropriate. 
The function numpy. loadtxt ( ) is somewhat faster than the similar function numpy. 
genf romtxt (), but the latter can cope better with missing data, and you are able to provide 
functions to express what is to be done during the Processing of certain columns of loaded 
data files. 



Currently, the csv module doesn't support Unicode, and so you 
must explicitiy convert the read data into UTF-8 or ASCII printable 
The officiai Python CSV documentation offers good examples on 
how to resolve data encoding issues. 

In Python 3.3 and later versions, Unicode support is default and 
there are no such issues. 


Importing data from Microsoft Excel files 


Although Microsoft Excel supports some charting, sometimes you need more flexible and 
powerfui visualization and need to export data from existing spreadsheets into Python for 
further use. 

A common approach to importing data from Excel files is to export data from Excel into 
CSV-formatted files and use the tools described in the previous recipe to import data using 
Python from the CSV file. This is a fairiy easy process if we have one or two files (and have 
Microsoft Excel or OpenOffice.org installed), but if we are automating a data pipe for many 
files (as part of an ongoing data Processing effort), we are not in a position to manually 
convert every Excel file into CSV. So, we need a way to read any Excel file. 

Python has decent support for reading and writing Excel files through the project 
www.python-excel. org. This support is available in the form of different modules 
for reading and writing and is platform-independent; in other words, we don't have to 
run it on Windows in order to read Excel files. 
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The Microsoft Excel file format changed over time, and support for different versions is 
available in different Python librarios. The latest stable version of XLRD is 0.90 at the 
time of this writing and it has support for reading . xlsx files. 


Getting ready 


First, we need to install the required module. For this example, we will use the module xlrd. 
We will use pip in our Virtual environment, as shown in the following code: 

$ mkvirtualenv xlrdexample 
(xlrdexample)$ pip install xlrd 

After successfui installation, use the sample file ch 02 -xlsxdata.xlsx. 


How to do it... 


The following code example demonstratos how to read a sample dataset from a known 
Excel file. We will do this as shown in the following steps: 

1. Open the file workbook. 

2. Find the sheet by name. 

3. Read the cells usingthe number of rows (nrows) and columns (ncols). 

4. For demonstration purposes, we oniy printthe read dataset. 

This is shown in the following code: 
import xlrd 

file = 'ch02-xlsxdata.xlsx' 
wb = xlrd.open_workbook(filename=file) 
ws = wb.sheet_by_name('Sheetl') 
dataset = [] 

for r in xrange(ws.nrows); 
coi = [] 

for c in range(ws.ncols): 

coi.append(ws.cell(r, c).value) 
dataset.append(coi) 

from pprint import pprint 
pprint(dataset) 
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How it Works... 


Let's try to explain the simple object modei that xlrd uses. At the top level, we have a 
workbook (the Python class xlrd.book.Book) that consists of one or more worksheets 
(xlrd. sheet. sheet), and every sheet has a cell (xlrd. sheet. Cell) from which we 
can then read the value. 

We load a workbook from a file using open_workbook (), which returns the xlrd. book. 
Book instance that contains ali the Information about a workbook like sheets. We access 
sheets using sheet_by_name (); if we need ali sheets, we couid use sheets (), which 
returns a list of the xlrd. sheet. Sheet instances. The xlrd. sheet. sheet class has 
a number of columns and rows as attributes that we can use to infer ranges for our loop 
to access every particular cell inside a worksheet using the method cell (). There is an 
xrld. sheet. Cell class, though it is not something we wantto use directiy. 

Note that the date is stored as a floating point number and not as a separate data type, 
but the xlrd module is able to inspect the value and try to infer if the data is in fact a date. 
So, we can inspect the cell type for the cell to get the Python date object. The module xlrd 
will return xlrd.XL_CELL_DATE as the cell type if the number format string looks like a date. 
Here is a snippet of code that demonstratas this: 

from datetime import datetime 

from xlrd import open_workbook, xldate_as_tuple 

cell = sheet.cell(1, 0) 
print cell 
print cell.value 
print cell.ctype 

if cell.ctype == xlrd.XL_CELL_DATE: 

date_value = xldate_as_tuple(cell.value, book.datemode) 
print datetime(*date_value) 

This field stili has issues, so please refer to the officiai documentation and mailing list in case 
you require extensive work with dates. 


There's more... 


A neat feature of xlrd is its ability to load oniy parts of the file that are required in the 
memory. There is an on_demand parameter that can be passed as True value while 
calling open_workbook so that the worksheet will oniy be loaded when requested. 
See the following example of code snippet for this: 

book = open_workbook('large.xls', on_demand=True) 
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We didn't mention writing Excel files in this section partiy because there will be a separate 
recipe for that and partiy because there is a different module for that— xlwt. You will read 
more about it in the Exporting data toJSON, CSV, and Excel recipe in this chapter. 

If you need specific usage that was not covered with the module and examples explained 
earlier, here is a list of other Python modules on PyPi that might help you out with 
spreadsheets http : //pypi . python. org/pypi? : action=browse&c=3 77. 


Importing data from fixed-width data files 


Logfiles from events and time series data files are common sources for data visualizations. 
Sometimes, we can read them using CSV dialect for tab-separated data, but sometimes they 
are not separated by any specific character. Instead, fields are of fixed widths and we can infer 
the format to match and extract data. 

One way to approach this is to read a file line by line and then use string manipulation 
functions to split a string into separate parts. This approach seems straightforward, 
and if performance is not an issue, it shouid be tried first. 

If performance is more important or the file to parse is large (hundreds of megabytes), using 
the Python module struet (http : //docs . python. org/library/struct. html) can 
speed us up as the module is implemented in C rather than in Python. 


Getting ready 


As the module struet is part of the Python Standard Library, we don't need to install any 
additional Software to implementthis recipe. 


How to do it... 


We will use a pregenerated dataset with a million rows of fixed-width records. Here's what 
sample data looks like: 


207152670 

427053180 

316700885 

138359697 

476953136 

213420370 


3984356804116 9532 
1466959270421 5338 
9726131532544 4920 
3286515244210 7400 
0921567802830 4214 
6459362591178 0546 


This dataset is generated using code that can be found in the repository for this chapter— 
eh02-generate_f_data.py. 


{Hh 







Knowing Your Data 


Now we can read the data. We can use the following code sample. We will carry out the 
following steps for this: 

1. Define the data file to read. 

2. Define the mask for how to read the data. 

3. Read line by line using the mask to unpack each line into separate data fields. 

4. Print each line as separate fields. 

This is shown in the following code snippet: 

import struet 
import string 

datafile = 'ch02-fixed-width-lM.data' 

# this is where we define how to 

# understand line of data from the file 
mask='9sl4s5s' 


with open(datafile, 'r') as f; 
for line in f; 

fields = struet.Struet(mask).unpack_from(line) 
print 'fields; [field.strip() for field in fields] 


How it Works... 


We define our format mask according to what we have previously seen in the datafile. 

To see the file, we couid have used Linux shell commands such as head or more or 
something similar. 

String formats are used to define the expected layout of the data to extract. We use format 
characters to define what type of data we expect. So if the mask is defined as 9si5s5s, we 
can read that as "a string of nine character width, followed by a string width of 15 characters 
and then again followed by a string of five characters." 

In general, c defines the character (the char type in C) or a string of length 1, s defines 
a string (the char [] type in C), d defines a float (the double type in C), and so on. The 
complete table is available on the officiai Python website at http: //docs. python. org/ 
library/struet.html#format-characters. 

We then read the file line by line and extract (the unpack_f rom method) the line according 
to the specified format. Because we might have extraneous spaces before (or after) our fields, 
we use strip () to strip every extracted field. 
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For unpacking, we used the object-oriented (00) approach using the struet. struet 
class, but we couid have as well used the non-object approach where the line wouid be 
as shown here: 

fields = struet.unpack_from(mask, line) 

The oniy difference is the usage of pattern. If we are to perform more Processing using 
the same formatting mask, the 00 approach saves us from stating that format in every 
call. Moreover, it gives us the ability to inherit the struet. struet class in future, thus 
extending or providing additional functionality for specific needs. 


Importing data from tab-delimited files 


Another very common format of flat datafile is the tab-delimited file. This can also come from 
an Excel export but can be the output of some custom Software we must get our input from. 

The good thing is that usually this format can be read in almost the same way as CSV files as the 
python module esv supports the so-called dialects that enable us to use the same principies to 
read variations of similar file formats, one of them being the tab- delimited format. 


Getting ready 


Now youYe aiready able to read CSV files. If not, please refer to the Importing data from CSV 
recipe first. 


How to do it... 


We will reuse the code from the Importing data from CSV recipe, where all we need to change 
is the dialect we are using as shown in the following code: 

import CSV 


filename = ' ch02-data.tab' 

data = [] 
try; 

with open(filename) as f: 

reader = csv.reader(f, dialect=csv.excel_tab) 
header = reader.next() 
data = [row for row in reader] 
except csv.Error as e; 

print "Error reading CSV file at line %s: %s" % (reader.line_num, 
e) 

sys.exit(-1) 
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if header; 

print header 
print '=====: 


for datarow in data; 
print datarow 


How it Works... 


The dialect-based approach is very similar to what we aiready did in the Importing data 
from CSV recipe, except for the iine where we instantiate the csv reader object, giving it 
the parameter dialect and specifying the excel_tab diaiect that we want. 


There's more... 


A CSV-based approach wiii not work if the data is "dirty", that is, if there are certain iines not 
ending with just a new iine character but have additionai \t (Tab) markers. So we need to 
ciean speciai iines separateiy before spiittingthem. The sampie "dirty" tab-deiimited fiie can 
be found in ch02-data-dirty. tab. The foiiowing code sampie cieans data as it reads it: 

datafile = 'ch02-data-dirty.tab' 

with open(datafile, 'r') as f; 
for line in f; 

# remove next comment to see line before cleanup 

# print 'DIRTY: line.split('\t') 

# we remove any space in line start or end 
line = line.stripO 

# now we split the line by tab delimiter 
print line.split('\t') 

We aiso see that there is another approach to do this—usingthe split (' \t ') function. 

The advantage of usingthe csv moduie approach over split () is that we can reuse the 
same code for reading byjust changingthe diaiect and detecting it with the fiie extension 
( . csv and . tab) or some other method (for exampie, usingthe csv. Snif f er ciass). 
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Importing data from a JSON resource 


This recipe will show us how we can read the JSON data format. Moreover, well be using a 
remote resource in this recipe. It wiii add a tiny ievei of compiexity to the recipe, but it wiii 
aiso make it much more usefui because in reai iife we wiii encounter more remote resources 
than iocai ones. 

JavaScript Object Notation (JSON) is wideiy used as a piatform-independent format to 
exchange data between systems or appiications. 

A resource, in this context, is anything we can read, be it a fiie or a URL endpoint (which can 
be the output of a remote process/program or just a remote static fiie). In short, we don't 
care who produced a resource and howthey did it; wejust need itto be in a known format 
iike JSON. 


Getting ready 


In orderto getstarted with this recipe, we need the requests moduie instaiied and 
importabie (in pythonpath) in our virtuai environment. We have instaiied this moduie 
in Chapter 1, Preparing Your Working Environment 

We aiso need Internet connectivity as we'll be reading a remote resource. 


How to do it... 


The following code sample performs reading and parsing of the recent activities' timeiine from 
the GitHub (http: //github. com) site. We wiii perform the foiiowing steps for this: 

1. Define the GitHub URL of a JSON fiie with the detaiis of a GitHub profiie. 

2. Get the contents from the URL using the requests moduie. 

3. Read the contentas JSON. 

Here is the code for this: 

import requests 
from pprint import pprint 
uri = 'https://api.github.com/users/justglowing' 
r = requests.get(uri) 
json_obj = r.json0pprint(json_obj) 
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How it Works... 


First, we use the "requests" module to fetch a remote resource. This is very straightforward 
as the "requests" module offers a simple API to define HTTP verbs, so we just need to issue 
one get () method call. This method retrieves data and request metadata and wraps it in 
the "Response" object, so we can inspect it. For this recipe, we are oniy interested in the 
Response. j son () method, which automatically reads content (available at Response. 
content) and parses it as JSON and loads it into the JSON object. 

Now that we have the JSON object, we can process the data. In order to do that, we need to 
understand what data looks like. We can achieve that understanding by opening the JSON 
resource using our favorite web browser or command-line tool such as wget or curi. 

Another way is to fetch data from IPython and inspect it interactively. We can achieve that 
by running our program from IPython (using %run program_name .py). After execution, 
we are left with ali variables that the program produced. List them ali using %who or %whos. 

Whatever method we use, we gain knowledge about the structure of the JSON data and the 
ability to see what parts of that structure we are interested in. 

The JSON object is basically just a Python dictionary (or if stated in a more complex manner, 
a dictionary of dictionaries) and we can access parts of it using a well-known, key-based 
notation. In our example, the . j son file contains the detaiis of a GitHub profile and we can 
access the location of the user referencing j son_ob j [' location ' ]. If we compare the 
structure of the dictionary j son_ob j with that of the . j son file, we see that each entry in 
the . j son file corresponds to a key in the dictionary. This means that the entire content of 
the . j son file is now into the dictionary (keep in mind that when you load a . j son file, the 
order of the keys is not preserved!). 


There's more... 


The JSON format (specified by RFC 4627; refer to http: //tools . ietf . org/html/ 
rfc4627.html) became very popular recently as it is more human readable than XML 
and is also less verbose. Hence, it's lighter in terms of the syntaxes required to transfer 
data. It is very popular in the web application domain as it is native to JavaScript, the 
language used for most of today's rich Internet applications. 

The Python JSON module has more capabilities than we have displayed here; for example, 
we couid specialize the basic JSONEncoder/JSONDecoder class to transform our Python 
data into JSON format. The classical example uses this approach to JSON-ify the Python 
built-in type for complex numbers. 

For simple customization, we don't have to subclass the JSONDecoder/jsoNEncoder 
class as some of the parameters can solve our problems. 
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For example, j son. loads () will parse a float as the Python type f loat, and most of the 
time it wiii be right. Sometimes, however, the fioat vaiue in the . j son fiie represents a price 
vaiue, and this is better represented as a decimai. We can instruet the j son parser to parse 
fioats as decimai. For exampie, we have the foiiowing JSON string: 

jstring = '{"name"prodi","price":12.50}' 

This is foiiowed by these two iines of code: 

from decimai import Decimai 
j son.loads(jstring, parse_float=Decimal) 

The preceding two iines of code wiii generate this output: 

{u'name': u'prodi', u'price': Decimai('12.50')} 


Exporting data to JSON, CSV, and Excel 


Whiie as producers of data visuaiization, we are mostiy using other peopie's data, importing 
and reading data are our major activities. We do need to write or export data that we 
produced or processed, whether it is for our or others' current or future use. 

We wiii demonstrate how to use the previousiy mentioned Python moduies to import, export, 
and write data to various formats such as JSON, CSV, and XLSX. 

For demonstration purposes, we are using the pregenerated dataset from the Importing data 
from fixed-width data files recipe. 


Getting ready 


For the Excei writing part, we wiii need to instaii the xlwt moduie (inside our virtuai 
environment) by exeeuting the foiiowing command: 

$ pip install xlwt 


How to do it... 


We wiii present one code sampie that contains aii the formats that we want to demonstrate: 
CSV, JSON, and XLSX. The main part of the program accepts the input and caiis appropriate 
functions to transform data. We wiii waik through separate sections of code expiaining its 
purpose, as shown here: 

1. Import the required moduies: 

import os 
import sys 
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import argparse 
try; 

import cStringlO as StringlO 
except: 

import StringlO 
import struet 
import json 
import csv 

2. Then, define the appropriate functions for reading and writing data: 

def import_data(import_file): 

1 t 1 

Imports data from import_file. 

Expects to find fixed width row 
Sample row: 161322597 0386544351896 0042 

1 t 1 

mask = '9sl4s5s' 
data = [] 

with open(import_file, 'r') as f: 

for line in f: 

# unpack line to tuple 

fields = struet.Struet(mask).unpack_from(line) 

# strip any whitespaee for eaeh field 

# paek everything in a list and add to full dataset 
data.append(list([f.strip() for f in fields])) 

return data 

def write_data(data, export_format): 

'''Dispatehes eall to a speeifie transformer and returns data 

set. 

Exeeption is xlsx where we have to save data in a file. 

1 t 1 

if export_format == 'esv': 

return write_esv(data) 
elif export_format == 'json': 

return write_json(data) 
elif export_format == 'xlsx': 

return write_xlsx(data) 
else: 

raise Exeeption("Illegal format defined") 
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3. We separately specify separate implementation for each data format (CSV, JSON, 
and XLSX): 

def write_csv(data): 

'''Transforms data into csv. Returns csv as string. 

1 t 1 

# Using this to simulate file 10, 

# as csv can only write to files, 
f = StringlO.StringlO() 

writer = csv.writer(f) 
for row in data; 

writer.writerow(row) 

# Get the content of the file-like object 
return f.getvalueO 

def write_json(data): 

'''Transforms data into json. Very straightforward. 

1 t 1 

j = json.dumps(data) 
return j 

def write_xlsx(data): 

'''Writes data into xlsx file. 


from xlwt import Workbook 
book = Workbook() 

sheetl = book.add_sheet("Sheet 1") 
row = 0 

for line in data: 
coi = 0 

for datum in line; 
print datum 

sheetl.write(row, coi, datum) 
coi += 1 
row += 1 

# We have hard limit here of 65535 rows 

# that we are able to save in spreadsheet. 
if row > 6553 5 : 

print >> sys.stderr, "Hit limit of # of rows in one 
sheet (65535)." 

break 

# XLS is special case where we have to 

# save the file and just return 0 
f = StringlO.StringlO() 

book.save(f) 
return f.getvalueO 
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4. Finally, we have the main code entry point, where we parse argument-like files from 
the command line to import data and export it to the required format: 

if _name_ == '_main_' ; 

# parse input arguments 

parser = argparse.ArgumentParser() 

parser.add_argument("import_file", help="Path to a fixed-width 
data file. ") 

parser.add_argument("export_format", help="Export format: 
j son, csv, xlsx.") 

args = parser.parse_args() 

if args.import_file is None: 

print >> sys.stderr, "You must specify path to import 

from." 

sys.exit(1) 

if args.export_format not in ('csvjsonxlsx'): 

print >> sys.stderr, "You must provide valid export file 

format." 

sys.exit(1) 

# verify given path is accessible file 
if not os.path.isfile(args.import_file): 

print >> sys.stderr, "Given path is not a file: %s" % 
args.import_file 

sys.exit(1) 

# read from formatted fixed-width file 
data = import_data(args.import_file) 

# export data to specified format 

# to make this Unix-like pipe-able 

# we just print to stdout 

print write_data(data, args.export_format) 


How it Works... 


In one broad sentence, we import the fixed-width dataset (as defined in the Importing data 
from fixed-width datafiles recipe) and then export thatto stdout, so we can catch that in a 
file or as an input to another program. 

We call out the programmer from the command line giving two mandatory arguments: the 
input filename and the export data format (JSON, CSV, and XLSX). 
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If we successfully parse those arguments, we dispatch the input file reading to a function 
import_data () , which returns the Python data structure (list of lists) that we can easily 
manipulate to get to the appropriate export format. 

We route our request inside the write_data () function, where we justforward a call to the 
appropriate function (for example, write_csv ()). 

For CSV, we obtain the csv. writer () instance that we use to write every line of data we 
iterate over. 

Wejust return the given stringas we will redirectthis outputfrom our program to another 
program (orjustto copy in a file). 

The JSON export is not required for this example as the j son module provides us with the 
dump () method that happily reads our Python structure. Just as for CSV, we simply return 
and dump this output to stdout. 

The Excel export requires more code as we need to create a more complex modei of the 
Excel workbook and worksheet(s) that will hold the data. This activity is followed by a similar 
iterative approach. We have two loops—the outer one goes over every line in the source 
dataset iterated and the inner one iterates over every field in the given line. 

After ali this, we save the Book instance into a file-like stream that we can return to stdout 
and use it both in read files and the files consumed by the web Service. 


There's more... 


This, of course, is just a small set of possible data formats that we couid be exportingto. 
It is fairiy easy to modify the behavior. Basically, two places need changes: the import and 
export functions. The function for import needs to change if we want to import a new kind 
of data source. 

If we want to add a new export format, we need to first add functions that will return a 
stream of formatted data. Then, we need to update the write_data () function to add 
the new elif branch to have it call our new write_* function. 

One thing we couid also do is make this a Python package, so we can reuse it over more 
projects. In that case, we wouid like to make import more flexible and add some more 
configuration features for import. 
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Importing and manipulating data with 
Pandas 


UntiI now we have seen how to import and export data using mostly the tools provided in the 
Python Standard library. Now, we'll see howto do some of the operations shown above in 
just few iines using the Pandas iibrary. Pandas is an open source, BSD-iicensed iibrary that 
simpiifies the process of data import and manipuiation thus providing data structures and 
parsingfunctions. 

We wiii demonstrate how to import, manipuiate and export data using Pandas. 


Getting ready 


To be abie to use the code in this section, we need to instaii Pandas.This can be done again 
using pip as shown here: 

pip install pandas 


How to do it... 


Here, we wiii import again the data ch2 -data. csv, add a new coiumn to the originai data 
and export the resuit in csv, as shown in the foiiowing code snippet: 

data = pd.read_csv('ch02-data.csv') 
data ['amount_x_2'] = data['amount']*2 
data.to_csv('ch02-data more.csv) 


How it Works... 


First, we import Pandas in our environment and then we use the function read_csv on 
the fiie that we want to read. This function automaticaiiy parses the csv format and niceiy 
organizes the data in an indexed structure caiied DataFrame. Then, we take the coiumns 
amount, we muitipiy each of its eiement by two and store the resuit in a new coiumns caiied 
amount_x_2. Finaiiy, we save the resuit into a new fiie named ch02 -data_more. csv using 
the method to_csv. A DataFrame is a Pandas object which represents a tabie and we can 
access its coiumns as shown in the foiiowing section 
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There's more... 


DataFrames are very handy structures; theyYe designed to be fast and easy to access. Each 
column thatthey contain becomes an attribute of the objectthat represents the data frame. 
For example, we can print the values in the coiumn amount of the object data defined eariier 
as shown here: 

>>>print data.amount 

>>>0 323 1 233 2 433 3 555 4 123 506 221 Name: amount, dtype: int64 

We can aiso print the iist of aii the coiumns in a dataframe as shown in the foiiowing code: 

>>>print data.coiumns 

>>>Index([u'day', u'amount'], dtype='object') 

Aiso, the function read_csv that we used to import the data has many parameters that we 
make use of to deai with messy fiies and parse particuiar data formats. For exampie, if the 
vaiues of our fiies are deiimited by spaces instead of commas, we can use the parameter 
deiimiter to correctiy parse the data. Flere's an exampie of where we import data from a 
fiie, where the vaiues are separated by a variabie number of spaces and we specify our 
custom header: 

pd.read_csv('ch02-data.tab', skiprows=l, 
delimiter=' *', names=['day','amount']) 


Importing data from a database 


Very often, our work on data anaiysis and visuaiization is at the consumer end of the data 
pipeiine. We most often use the aiready produced data rather than producing the data 
ourseives. A modern appiication, for exampie, hoids different datasets inside reiationai 
databases (or other databases iike MongoDB), and we use these databases to produce 
beautifui graphs. 

This recipe wiii show you how to use SQL drivers from Python to access data. 

We wiii demonstrate this recipe using a SQLite database because it requires the ieast effort 
to set up, but the interface is simiiar to most other SQL-based database engines (MySQL and 
PostgreSQL). There are, however, differences in the SQL diaiectthatthose database engines 
support. This exampie uses simpie SQL ianguage and shouid be reproducibie on most 
common SQL database engines. 
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Getting ready 


To be able to execute this recipe, we need to install the SQLite library as shown here: 

$ sudo apt-get install sqliteS 

Python support for SQLite is avaiiabie by defauit, so we don't need to instaii anything 
Python-reiated. Just fire the foiiowing code snippet in IPython to verify that everything 
is present: 

import sqliteS 
sqliteS.version 
sqliteS.sqlite_version 

We get an output simiiar to this as shown here: 

In [1]: import sqliteS 


In [2]: sqliteS.version 
Out[2]: '2.6.0' 


In [3]: sqlite3.sqliteversion 
Out[3]: '3.8.4.3' 

Here, sqliteS.version gets us the version of the Python sqliteS moduie, and sqlite 
version returns the System SQLite iibrary version. 


How to do it... 


To be abie to read from the database, we need to perform the foiiowing steps: 

1. Connect to the database engine (or the fiie in the case of SQLite). 

2. Run the query against the seiected tabies. 

3. Read the resuit returned from the database engine. 

I wiii not try to teach SQL here as there are many books on that particuiar topic. But just for 
the purpose of ciarity, we wiii expiain the SQL query in this code sampie: 

SELECT ID, Name, Population FROM City ORDER BY Population DESC LIMIT 
1000 

ID, Name, and Population are coiumns (fieids) of the tabie City from which we seiect data. 
ORDER BY teiis the database engine to sort our data by the Population coiumn, and desc 
means descending order. limit aiiows us to get Just the first 1,000 records found. 
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For this example, we will use the world. sql example table, which holds the world's city 
names and populations. This table has more than 5,000 entries. 

First, we need to import this SQL file into the SQLite database. Flere's is the code on how 
to do it: 

import sqliteS 
import sys 

if len(sys.argv) < 2: 

print "Error: You must supply at least SQL script." 
print "Usage: %s table.db ./sql-dump.sql" % (sys.argv[0]) 
sys.exit(1) 

script_path = sys.argv[1] 

if len(sys.argv) == 3: 

db = sys.argv[2] 
else : 

# if DB is not defined 

# create memory database 
db = ":memory:" 


try: 

con = sqliteS.connect(db) 
with con: 

cur = con.cursor 0 

with open(script_path,'rb') as f: 
cur.executescript(f.read()) 
except sqliteS.Error as err: 

print "Error occurred: %s" % err 

This reads the SQL file and executes the SQL statements against the opened SQLite db file. 

If we don't specify the filename, SQLite creates the database in the memory. The statements 
are then executed line by line. 

If we encounter any errors, we catch exceptions and print the error message to the user. 

After we have imported data into the database, we are able to query the data and do some 
Processing. Flere is the code to read the data from the database file: 

import sqliteS 
import sys 

if len(sys.argv) != 2: 
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print "Please specify database file." 
sys.exit(1) 

db = sys.argv[l] 


try; 

con = sqliteS.connect(db) 
with con: 

cur = con.cursor 0 

query = 'SELECT ID, Name, Population FROM City ORDER BY 
Population DESC LIMIT 1000' 

con.text_factory = str 
cur.execute(query) 

resultset = cur.fetchall() 

# extract column names 

col_names = [cn[0] for cn in cur.description] 
print "%10s %30s %10s" % tuple(col_names) 
print " = "* (10 + 1 + 30 + 1 + 10) 

for row in resultset: 

print "%10s %30s %10s" % row 
except sqlite3.Error as err: 
print "[ERROR]:", err 


Here's an example of how to use the two preceding scripts: 

$ python ch02-sqlite-import.py world.sql world.db 
$ python ch02-sqlite-read.py world.db 


ID 

Name 

Population 

1024 

Mumbai (Bombay) 

10500000 

2331 

Seoul 

9981619 

206 

S?o Paulo 

9968485 

1890 

Shanghai 

9696300 
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How it Works... 


First, we verify thatthe user has provided the database file path. This is justa quick check to 
ensure that we can proceed with the rest of the cede. 

Then, we try to connectto the database; if thatfaiis, we catch sqliteS .Error and print it 
to the user. 

If the connection is successfui, we obtain a cursor using con. cursor (). A cursor is an 
iterator-like structure that enables us to traverse records of the resuit set returned from 
a database. 

We define a query that we execute over the connection and we fetch the resuit set using 
cur. fetchall (). Had we expected just one resuit, we wouid have used Just fetchone (). 

List comprehension over cur. descriptiori allows us to obtain column names. description 
is a read-only attribute and returns more than we need for just column names, so we just 
fetch the first item from every column's 7-item tuple. 

We then use simple string formatting to print the header of our table with column names. 
After that, we iterate over resultset and print every row in a similar manner. 


There's more... 


Databases are the most common sources of data today. We couid not present everything in 
this short recipe, but we can suggest you where to look for more Information. 

The officiai Python documentation is the first place to look for an explanation about how to 
Work with databases. The most common databases are open source databases, such as 
MySQL, PostgreSQL, and SQLite, and on the other end of the spectrum, there are enterprise 
database systems such as MS SQL, Oracle, and Sybase. Mostly Python has support for them 
and the interface is abstracted always, so you don't have to change your program if your 
underlying database changes, but some tweaks may be required. It depends on whether 
you have used the specifics of a particular database system. For example, Oracle supports 
a specific language PL/SQL that is not Standard SQL, and some things will not work if your 
database changes from Oracle to MS SQL. Similarly, SQLite does not support specifics from 
MySQL data types or database engine types (MylSAM and InnoDB). Those things can be 
annoying, but havingyour code rely on Standard SQL (available at http: //en. wikipedia. 
org/wiki/SQL: 2011) will make your code portable from one database system to another. 
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Cleaning up data from outiiers 


This recipe describes howto deai with datasets comingfrom the reai worid and howto ciean 
them before doing any visuaiization. 

We wiii present a few techniques, which are different in essence but have the same goai, to 
getthe data cieaned. 

Cieaning, however, shouid not be fuiiy automatic. We need to understand the data as given 
and be abie to understand what the outiiers are and what the data points represent before 
we appiy any of the robust modern aigorithms made to ciean the data. This is not something 
that can be defined in a recipe because it reiies on vast areas such as statistics, knowiedge 
of the domain, and a good eye (and then some iuck). 


Getting ready 


We wiii use the Standard Python moduies we aiready know about, so no additionai instaiiation 
is required. 

In this recipe, I wiii introduce a new. Median absolute deviation (MAD) in statistics represents 
a measure of the variabiiity of a univariate (possessing one variabie) sampie of quantitative 
data. It is a measure of statisticai dispersion. It falis into a group of robust statistics in a way 
that it is more resilient to outiiers. 


How to do it... 


Here's one example that shows how to use MAD to detect outiiers in our data. We wiii perform 
the following steps for this: 

1. Generate normally distributed random data. 

2. Add in a few outiiers. 

3. Use the function is_outlier () to detect the outiiers. 

4. Plot both the datasets (x and f iltered) to see the difference. 

Look at the following lines of code depicting this: 

import numpy as np 

import matplotlib.pyplot as plt 


def is_outlier(points, threshold=3.5): 

tl It II 

This returns a boolean array with "True" if points are outiiers and 
"False" 
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otherwise. 

These are the data points with a modified z-score greater than 
this : 

# value will be classified as outliers. 

tl II II 

# transform into vector 

if len(points.shape) == 1; 
points = points[:,None] 

# compute median value 

median = np.median(points, axis=0) 

# compute diff sums along the axis 

diff = np.sum((points - median)**2, axis=-l) 
diff = np.sqrt(diff) 

# compute MAD 

med_abs_deviation = np.median(diff) 

# compute modified Z-score 

# http://www.itl.nist.gov/div898/handbook/eda/section4/eda43. 
htm#Iglewicz 

modified_z_score = 0.6745 * diff / med_abs_deviation 

# return a mask for each outlier 
return modified_z_score > threshold 

# Random data 

X = np.random.random(100) 

# histogram buckets 
buckets = 50 

# Add in a few outliers 

X = np.r_[x, -49, 95, 100, -100] 

# Keep valid data points 

# Note here that 

# is logical NOT on boolean numpy arrays 
filtered = x[~is_outlier(x)] 

# plot histograms 
plt.figure() 
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plt.subplot(211) 
plt.hist(x, buckets) 
plt.xlabel('Raw') 

plt.subplot(212) 

plt.hist(filtered, buckets) 

plt.xlabel('Cleaned') 

plt.Show() 

Note that in NumPy, the ~ operator is overioaded to operate as a iogicai operator and not on 
Booiean arrays. 

The preceding code produces two distinet histograms. The first one, which has been drawn 
using aii the data, contains one main box with height 100 centered in 0.5 and three other very 
smaii boxes. This means that most of the sampies were grouped in the first box and the other 
boxes just contain outiiers. Indeed, in the second histogram, which has been drawn without 
the outiiers, we can observe the detaiis of the distribution of the data in the intervai 0-1. 



Another way to identify outiiers is to visuaiiy inspect your data. In order to do so, we couid 
create scatter piots, where we couid easiiy spot vaiues that are out of the centrai swarm 
or create a box piot, which wiii dispiay the median, quartiies above and beiow the median, 
and points that are distant even from the extremes of the distribution of the data. 
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The box extends from the lower to the upper quartile values of the data, with a line at the 
median. The whiskers extend from the box to show the interquartile range. Flier points are 
those pastthe end of the whiskers. 

Here's an example to demonstrate that: 

from pylab import * 

# fake up some data 
spread= rand(50) * 100 
center = ones(25) * 50 

# generate some outliers high and low 
flier_high = rand(lO) * 100 + 100 
flier_low = rand(lO) * -100 

# merge generated data set 

data = concatenate((spread, center, flier_high, flier_low), 0) 

subplot(311) 

# basic plot 

# 'gx' defining the outlier plotting properties 

boxplot(data, 0, 'gx') 

# compare this with similar scatter plot 
subplot(312) 

spread_l = concatenate((spread, flier_high, flier_low), 0) 
center_l = ones(70) * 25 
scatter(center_l, spread_l) 
xlim([0, 50]) 

# and with another that is more appropriate for 

# scatter plot 
subplot(313) 

center_2 = rand(70) * 50 
scatter(center_2, spread_l) 
xlim([0, 50]) 

Show() 






Knowing Your Data - 

We can then see x-shaped markers representing outiiers, as shown in the following table: 



We can also see that the second plot showing a similar dataset in the scatter plot is not very 
intuitive because the x axis has ali the values at 25 and we don't really distinguish between 
iniiers and outiiers. 

The third plot, where we generated values on the x axis to be spread across the range from 
0 to 50, gives us more visibility of the different values and we can see what values are outiiers 
in terms of the y axis. 

What if we have a dataset with missing values? We can use NumPy loaders to compensate 
for missing values, or we can write code to replace existing values with the ones we need for 
further use. 

For example, we want to illustrate some dataset over the geographical map of USA and have 
values for state names that are not consistent in the dataset. For example, we have values 
OH, ohio, OHio, us-OH, and OH-usAall representing the state of Ohio in the USA. What we 
must do in this situation is that we need to inspect the dataset manually by loading it in a 
spreadsheet processor such as Microsoft Excel or OpenOffice.org Cale. Sometimes, it is easy 
enough to just print ali the lines using Python. If the file is CSV or CSV-like, we can open it with 
any text editor and inspect the data directiy. 
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After we have concluded what is present in the data, we can write Python code to group those 
simiiar vaiues and repiace them with the one vaiue that is going to make further Processing 
consistent. The usuai way of doingthis is to read in iines of the fiie using readlines () and 
use Standard Python string manipuiation functions to perform manipuiations. 


There's more... 


There are speciai products, both commerciai and non-commerciai (such as OpenRefine 
avaiiabie at https : //github. com/OpenRef ine) that provide some automation around 
transformation on "dirty" iive datasets. 

Manuai work is stiii invoived, depending on how noisy the data is and how great our 
understanding of that data is. 

If you want to find out more about cieaning outiiers and cieaning of data in generai, iook for 
statisticai modeis and the sampiing theory. 


Reading files in chunks 


Python is very good at handiing reading and writing fiies or fiie-iike objects. For exampie, if you 
try to ioad big fiies, say a few hundred MB, assuming you have a modern machine with at ieast 
2 GB of RAM, Python wiii be abie to handie it without any issue. It wiii not try to ioad everything 
at once, but piay smart and ioad it as needed. 

So even with decent fiie sizes, doing something as simpie as the foiiowing code wiii work 
straight out of the box: 

with open('/tmp/my_big_file', 'r') as bigfile: 

for line in bigfile; 

# line based operation, like 'print line' 

But if we want to jump to a particuiar piace in the fiie or do other nonsequentiai reading, 
we wiii need to use the handcrafted approach and use IO functions such as seek (), teli (), 
read (), and next () that aiiow enough fiexibiiity for most users. Most of these functions are 
just bindings to C impiementations (and are OS-specific), so they are fast, but their behavior 
can vary based on the OS we are running. 
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How to do it... 


Depending on what our aim is, Processing iarge fiies can sometimes be managed in chunks. 
For exampie, you couid read 1,000 iines and process them using Python Standard iterator- 
based approaches, as shown here: 

import sys 

filename = sys.argv[l] # must pass valid file name 

with open(filename, 'rb') as hugefile: 
chunksize = 1000 
readable = '' 

# if you want to stop after certain number of blocks 

# put condition in the while 
while hugefile: 

# if you want to start not from Ist byte 

# do a hugefile.seek(skipbytes) to skip 

# skipbytes of bytes from the file start 
start = hugefile.teli() 

print "starting at:", start 

file_block = '' # holds chunk_size of lines 

for _ in xrange(start, start + chunksize): 
line = hugefile.next() 
file_block = file_block + line 

print 'file_block', type(file_block), file_block 
readable = readable + file_block 

# teli where are we in file 

# file IO is usually buffered so tell() 

# will not be precise for every read. 
stop = hugefile.teli() 

print 'readable', type(readable), readable 

print 'reading bytes from %s to %s' % (start, stop) 

print 'read bytes total:', len(readable) 

# if you want to pause read between chucks 

# uncomment following line 
#raw_input() 

We caii this code from the Python command-iine interpreter, giving the fiiename path as the 
first parameter: 

$ python ch02-chunk-read.py myhugefile.dat 
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How it Works... 


We want to be able to read blocks of lines for Processing without readingthe whole file in 
the memory. 

We open the file and read in lines in the inner for loop. The way we move through the file is 
by calling next () on the file object. This function reads the line from the file and moves the 
file pointerto the next line. We append lines in the file_block variable duringthe loop 
execution. In order to simplify the example code, we don't do any Processing but just add 
file_blockto complete the output variable readable. 

We do some printing during execution just to illustrate the current state of certain variables. 

The last comment line in the while loop raw_input () can be uncommented and we can 
pause the execution and read the printed lines above it. 


There's more... 


This recipe is, of course, just one of the possible approaches to reading large (huge) files. 
Other approaches couid include specific Python or C librarios, but they all depend on what 
we aim to do with data and how we want to process it. 

Parallel approaches like the MapReduce paradigm have become very popular recently as 
we get more Processing power and memory for a low price. 

Multiprocessing is also a feasible approach sometimes as Python has good library support 
for creating and managing threads with several librarios such as multiprocessing, 
threading, and thread. 

If Processing huge files is a repeated process for a project, we suggest building your data 
pipeline so that every time you need data ready in a specific format on the output end, you 
don't have to go to the source and do it manually. 


Reading streaming data sources 


What if the data that is coming from the source is continuous? What if we need to read 
continuous data? This recipe will demonstrate a simple solution that will work for many 
common real-life scenarios, but it is not universal and you will need to modify it if you hit 
a speciai case in your application. 
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How to do it... 


In this recipe, we will show you how to read an ever-changing file and print the output. We will 
use the common Python module to accomplish this as shown here: 

import time 
import os 
import sys 

if len(sys.argv) != 2; 

print >> sys.stderr, "Please specify filename to read" 
filename = sys.argv[1] 
if not os.path.isfile(filename); 

print >> sys.stderr, "Given file: \"%s\" is not a file" % filename 

with open(filename,'r') as f: 

# Move to the end of file 
filesize = os.stat(filename)[6] 
f.seek(filesize) 

# endlessly loop 
while True: 

where = f.teli () 

# try reading a line 
line = f.readlineO 

# if empty, go back 
if not line: 

time.sleep(1) 
f.seek(where) 
else : 

# , at the end prevents print to add newline, as 

readline() 

# already read that. 
print line. 


How it Works... 


The core of the code is inside the while True: loop. This loop never stops (uniess we interrupt 
it by pressing Ctrl + C on our keyboard). We first move to the end of the file we are reading and 
then we try to read a line. If there is no line, that means nothing was added to the file after we 
checked it using seek (). So, we sleep for one second and then try again. 

If there is a non-empty line, we print that out and suppress the new line character. 
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There's more... 


We might want to read the last n lines. We couid do that by going almost to the end of the file. 
We couid go there by lookingfor the file, that is, with file. seek (filesize - N * avg_ 
line_len) . Here, avg_line_len shouid be the approximation of average line length in that 
file (approximately 1,024). Then, we couid use readlines () from that point to read line and 
then printjust [-N] lines from that list. 

The idea from this example can be used for various Solutions. For example, the input has to 
be a file-like object or a remote HTTP-accessible resource. Thus, one can read the input from 
a remote Service and continuously parse it and update live charts or update the intermediate 
queue, buffer, or database. 

One particular module is very usefui for stream handiing—io. It is in Python from Version 2.6, 
is buiit as a replacement for the file module, and is a default interface in Python 3.x. 

In some more complex data pipelines, we will need to enable some sort of message queues, 
where our incoming continuous data will have to be queued for some time before we are 
able to accept it. This enables us, as consumers of the data, to be able to pause Processing 
if we are overloaded. Having data on the common message bus enables other clients on the 
projectto consume the same data and not interfere with our Software. 


Importing image data into NumPy arrays 


We are going to demonstrate how to do image Processing using Python's libraries such as 
NumPy and SciPy. 

In scientific computing, images are usually seen as n-dimensional arrays. They are usually 
two-dimensional arrays; in our examples, they are represented as a NumPy array data 
structure. Therefore, functions and operations performed on those structures are seen 
as matrix operations. 

Images in this sense are not always two-dimensional. For medical or bio-sciences, images are 
data structures of higher dimensions such as 3D (having the z axis as depth or as the time 
axis) or 4D (having three spatiai dimensions and a temporal one as the fourth dimension). 

We will not be using those in this recipe. 

We can import images using various techniques; they ali depend on what you want to do with 
image. Also, it depends on the larger ecosystem of tools you are using and the platform you 
are running your project on. 

In this recipe, we will demonstrate several waysto use image Processing in Python, mainiy 
related to scientific Processing and less on the artistic side of image manipulation. 
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Getting ready 


In some examples in this recipe, we use the SciPy iibrary, which you have aiready instaiied 
if you have instaiied NumPy. If you haven't, it is easiiy instaiiabie using your OS's package 
manager by executing the foiiowing command: 

$ sudo apt-get install python-scipy 

For Windows users, we recommend using prepackaged Python environments iike EPD, 
which we discussed in Chapter 1, Preparing Your Working Environment. 

If you want to install these using officiai source distributions, make sure you have instaiied 
System dependencies, such as: 

► BLAS and LAPACK: libblas and liblapack 

► C and Fortran compilers: gcc and gf ortran 


How to do it... 


Whoever has worked in the field of digital signal Processing or even attended a university 
course on this or a related subject must have come across Lena's image, the de facto 
Standard image used for demonstrating image Processing algorithms. 

SciPy contains this image aiready packed inside the misc. module, so it is really simple 
for us to reuse that image. This is how you can read and show this image: 

import scipy.misc 

import matplotlib.pyplot as plt 

# load aiready prepared ndarray from scipy 
lena = scipy.misc.lena() 

# set the default colormap to gray 
plt.gray() 

plt.imshow(lena) 
plt. colorbar() 
plt.Show() 
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This shouid open a new window with a figure displaying Lena's image in gray tones and axes. 
The color bar shows a range of values in the figure; here it shows 0—black to 255—white. 



Further, we couid examine this object with the following code: 

print lena.shape 
print lena.maxO 
print lena.dtype 

The output for the preceding code is shown here: 

(512, 512) 

245 

dtype('int32') 

We see the following features in the image 512 points wide and 512 points high 

► The max value in the whole array (that is, the image) is 245 

► Every point is represented as a littie endian 32-bit long integer 

We couid also read in an image using Python Imaging Library (PIL), which we installed in 
Chapter 1, Preparing Your Working Environment. 
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Here is the cede to do that: 

import numpy 
import Image 

import matplotlib.pyplot as plt 
bug = Image.open('stinkbug.png') 

arr = numpy.array(bug.getdata(), numpy.uintS).reshape(bug.size [1] , 
bug.size[0], 3) 

plt.gray() 
plt.imshow(arr) 
plt. colorbar() 
plt.Show() 

We shouid see something similar to Lena's image as shown in the foiiowing tabie: 



This is usefui if we are aiready tapping into an existing system that uses PIL as their defauit 
image ioader. 
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How it Works... 


Other than just loading the images, what we really want to do is use Python to manipulate 
images and process them. For example, we want to be able to load a real image that consists 
of RGB channeis, convertthat into one channei ndarray, and later use array slicingto also 
zoom in to the part of the image. Here's the code to demonstrate how we are able to use 
NumPy and matplotiib to do that: 

import matplotiib.pyplot as plt 
import scipy 
import numpy 


bug = scipy.misc.imread('stinkbugl.png') 

# if you want to inspect the shape of the loaded image 

# uncomment following line 
#print bug.shape 

# the original image is RGB having values for all three 

# channeis separately. For simplicity, we convert that to greyscale 
image 

# by picking up just one channei. 

# convert to gray 
bug = bug[:, : , 0] 

bug [:, :, 0] is called array slicing. This NumPy feature allows us to select any part of the 
multidimensional array. For example, let's see a one-dimensional array: 

>>> a = array(5, 1, 2, 3, 4) 

>>> a [2:3] 
array([2]) 

>>> a [:2] 
array([5, 1]) 

>>> a [3:] 
array([3, 4] ) 

For multidimensional arrays, we separate each dimension with a comma (,) as shown here: 

>>> b = array([[1,1,1], [2,2,2], [3,3,3]]) # matrix 3x3 

>>> b[0,:] # pick first row 

array([1,1,1]) 

>>> b[:,0] # we pick the first column 

array([1,2,3]) 
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Have a look at the following code: 


# Show original image 
plt.f igure () 
plt .gray() 

plt.subplot(121) 
plt.imshow(bug) 


# Show 'zoomed' region 
zbug = bug[100:350,140:350] 

Here we zoom into the particular portion of the whole image. Remember that the image is 
just a muitidimensionai array represented as a NumPy array. Zooming here means seiecting 
a range of rows and coiumns from this matrix. So we seiect a partiai matrix from rows 100 to 
250 and coiumns 140 to 350. Remember that indexing starts at 0, so the row at coordinate 
100 is the lOlst row. 

Take a iook at the foiiowing code: 

plt.subplot(122) 
plt.imshow(zbug) 


plt.Show() 

This wiii be dispiayed as shown here: 
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There's more... 


For large images, we recommend using numpy.memmap for memory mapping of images. 

This will speed up manipulating the image data. Have a look at the following code as an 
example of this: 

import numpy 

file_name = 'stinkbug.png' 

image = numpy.memmap(file_name, dtype=numpy.uintS, shape = (375, 500)) 

Here we load part of a large file into memory, accessing it as a NumPy array. This is very 
efficient and allows us to manipulate file data structores as Standard NumPy arrays without 
loading everything into memory. The argument shape defines the shape of the array loaded 
from the f ile_name argument, which is a file-like object. Note that this is a concept similar 
to Python's mmap argument (available at http : //docs . python. org/2/library/mmap. 
html) but is different in a very important way. NumPy's memmap attribute returns an array-like 
object while Python's mmap returns a file-like object. So, the way we use them is very different 
yet very natural in each environment. 

There are some specialized packages that Just focus on image Processing like scikit-image 
(available at http://scikit-image.org/); this is basically a free collection of algorithms 
for image Processing buiit on top of NumPy/SciPy librarios. If you want to do edge detection, 
remove noise from an image, or find contours, scikit is the tool to use to look for algorithms. 
The best way to start is to look at the example gallery and find the example image and code 
(available at http: //scikit-image. org/docs/dev/auto_examples/). 


Generating controlled random datasets 


In this recipe, we will show different ways of generating random number sequences and 
Word sequences. Some of the examples use Standard Python modules, and others use 
NumPy/SciPy functions. 

We will go through some statistics terminology but will explain every term, so you don't 
have to have a statistical reference book with you while readingthis recipe. 

We generate artificiai datasets using common Python modules. By doing so, we are able 
to understand distributione, variance, sampling, and similar statistical terminology. More 
importantly, we can use this fake data as a way to understand if our statistical method is 
capable of discovering modeis we want to discover. We can do that because we know the 
modei in advance and verify our statistical method by applying it over our known data. In 
real life, we don't have that ability and there is always a percentage of uncertainty that we 
must assume, giving way to errors. 
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Getting ready 


We don't need anything new installed on the system in order to exercise these examples. 
Havingsome knowledge of statistics is usefui, aithough not required. 

To refresh our statisticai knowiedge, here's a iittie giossary we wiii use in this and the 
foiiowing chapters: 

► Distribution or probability distribution: This iinks the outcome of a statisticai 
experiment with the probabiiity of occurrence of that experiment. 

► Standard deviation: This is a numericai vaiue that indicates how individuais vary in 
comparison to a group. If they vary more, the Standard derivation wiii be big, and in 
the opposite condition—if aii the individuai experiments are more or iess the same 
across the whoie group, the Standard derivation wiii be smaii. 

► Variance: This equais the square of Standard derivation. 

► Population or statisticai population; This is a totai set of aii the potentiaiiy observabie 
cases. For exampie, aii the grades of aii the students in the worid if we are interested in 
getting the student average of the worid. 

► Sample; This is a subset of the popuiation. We cannot obtain aii the grades of aii the 
students in the worid, so we have to gather oniy a sampie of data and modei it. 


How to do it... 


We can generate a simpie random sampie using Python's moduie random. Here's an exampie 
of this: 

import pylab 
import random 

SAMPLE SIZE = 100 


# seed random generator 

# if no argument provided 

# uses System current time 
random.seed() 


# store generated random values here 
real_rand_vars = [] 

# pick some random values 

real_rand_vars = [random.random() for val in xrange(SIZE)] 

# create histogram from data in 10 buckets 
pylab.hist(real_rand_vars, 10) 
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# define x and y labeis 
pylab.xlabel("Number range") 
pylab.ylabel("Count") 
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# Show figure 
pylab. Show() 

This is a uniformiy distributed sample. When we run this example, we shouid see something 
similar to the following plot: 



Try setting sample_size to a big number (say loooo) and see how the histogram behaves. 

If we want to have values that range not from 0 to 1, but say from 1 to 6 (by simulating singie 
dice throws), we couid use random. randint (min, max) ; here, min and max are the lower 
and upper inclusive bounds respectively. If what you want to generate are floats and not 
integers, there is a random.uniform (min, max) function to provide that. 

In a similar fashion and using the same tools, we can generate a time series plot of fictional 
price growth data with some random noise, as shown here: 

import pylab 
import random 
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# days to generate data for 
duration = 100 

# mean value 
mean_inc = 0.2 

# Standard deviation 
std_dev_inc = 1.2 

# time series 

X = range(duration) 

y = [] 

price_today = 0 
for i in x: 

next_delta = random.normalvariate(mean_inc, std_dev_inc) 
price_today += next_delta 
y.append(price_today) 

pylab.plot(x,y) 
pylab.xlabel("Time") 
pylab.xlabel("Time") 
pylab.ylabel("Value") 
pylab.Show() 

This code defines a series of 100 data points (fictional days). For every next day, we pick 
a random value from the normal distribution (random. normalvariate ()) rangingfrom 
mean_inc to std_dev_inc and add that value to yesterday's price value (price_today). 

If we wanted more controi, we couid use different distributions. The following code illustrates 
and visualizes different distributions. We will comment separate code sections as we present 
them. We start by importing required modules and defining a number of histogram buckets. 
We also create a figure that will hold our histograms as shown in the following lines of code: 

# coding: utf-8 
import random 
import matplotlib 

import matplotlib.pyplot as plt 


SAMPLE_SIZE = 1000 

# histogram buckets 
buckets = 100 

plt.figure () 

# we need to update font size just for this example 
matplotlib.rcParams.update({'font.size': 7}) 
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To lay out all the required plots, we define a grid of six by two subpiots for all the histograms. 
The first plot is a uniformiy distributed random variable as seen in the foiiowing iines of code: 

plt.subplot(621) 

plt.xlabel("random.random") 

# Return the next random floating point number in the range [0.0, 

1.0) . 

res = [random.random() for _ in xrange(l, SAMPLE_SIZE)] 
plt.hist(res, buckets) 

For the second piot, we piot a uniformiy distributed random variabie as shown here: 

plt.subplot(622) 

plt.xlabel("random.uniform") 

# Return a random floating point number N such that a <= N <= b for a 
<= b and b <= N <= a for b < a. 

# The end-point value b may or may not be included in the range 
depending on floating-point rounding in the equation a + (b-a) * 
random(). 

a = 1 

b = SAMPLE_SIZE 

res = [random.uniform(a, b) for _ in xrange(l, SAMPLE_SIZE)] 
plt.hist(res, buckets) 

Here is the third piot which is a trianguiar distribution: 

plt.subplot(623) 

plt.xlabel("random.trianguiar") 

# Return a random floating point number N such that low <= N <= high 
and with the specified # mode between those bounds. The low and high 
bounds default to zero and one. The mode 

# argument defaults to the midpoint between the bounds, giving a 
symmetric distribution. 

low = 1 

high = SAMPLE_SIZE 

res = [random.trianguiar(low, high) for _ in xrange(l, SAMPLE_SIZE)] 
plt.hist(res, buckets) 

The fourth piot is a beta distribution. The condition on the parameters is that aipha and beta 
shouid be greater than zero. The returned vaiues range between 0 and 1. 

plt.subplot(624) 

plt.xlabel("random.betavariate") 
alpha = 1 
beta = 10 

res = [random.betavariate(alpha, beta) for _ in xrange(l, SAMPLE_ 
SIZE)] 

plt.hist(res, buckets) 
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The fifth plot visualizes an exponentiai distribution. lambd is 1.0 divided by the desired mean. 
It shouid be non-zero (the parameter wouid be called lambda, but that is a reserved word in 
Python). The returned values range from 0 to positive infinity if lambd is positive, and from 
negative infinity to 0 if lambd is negative, as shown here: 

plt.subplot(625) 

plt.xlabel("random.expovariate") 
lambd = 1.0 / ((SAMPLE_SIZE +1) /2.) 

res = [random.expovariate(lambd) for _ in xrange(l, SAMPLE_SIZE)] 
plt.hist(res, buckets) 

Our next plot is the gamma distribution, where the condition on the parameters is that alpha 
and beta are greaterthan 0. The probability distribution function is shown here: 


-.r 


PDF{x) = 





Here's the code for the gamma distribution: 
plt.subplot(626) 

plt.xlabel("random.gammavariate") 


alpha = 1 
beta = 10 

res = [random.gammavariate(alpha, beta) for _ in xrange(l, SAMPLE_ 
SIZE)] 

plt.hist(res, buckets) 

Log normal distribution is our next plot. If you take the natural logarithm of this distribution, 
you'll get a normal distribution with the mean mu and the Standard deviation sigma, mu can 
have any value; moreover, sigma must be greater than zero as shown here: 

plt.subplot(627) 

plt.xlabel("random.lognormvariate") 
mu = 1 
sigma = 0.5 

res = [random.lognormvariate(mu, sigma) for _ in xrange(l, SAMPLE_ 
SIZE)] 

plt.hist(res, buckets) 








The next plot is normal distributiori, where mu is the mean and sigma is the Standard 
deviation as shown here: 
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plt.subplot(628) 

plt.xlabel("random.normalvariate") 
mu = 1 
sigma = 0.5 

res = [random.normalvariate(mu, sigma) for _ in xrange(l, SAMPLE_ 
SIZE)] 

plt.hist(res, buckets) 

Here is the iast piot which is the Pareto distribution and alpha is the shape parameter: 

plt.subplot(629) 

plt.xlabel("random.paretovariate") 
alpha = 1 

res = [random.paretovariate(alpha) for _ in xrange(l, SAMPLE_SIZE)] 
plt.hist(res, buckets) 

plt.tight_layout() 
plt.Show() 

This was a big code exampie, but basicaiiy we pick 1,000 random numbers accordingto 
various distributions. These are common distributione used in different statisticai branches 
(economics, socioiogy, bio-sciences, and so on). 

We shouid see differences in the histogram based on the distribution aigorithm used. Take a 
moment to understand the foiiowing nine piots: 




random.uniform 



random.betavariate 




random. paretovariate 


{iLh 



































Knowing Your Data - 

Use seed () to initialize the pseudo-random generator, so random () produces the same 
expected random values. This is sometimes usefui and it is better than pregenerating random 
data and saving it to a fiie. The iatter technique is not aiways feasibie as it requires saving 
(possibiy huge amounts of) data on a fiiesystem. 

If you want to prevent any repeatabiiity of your randomiy generated sequences, we recommend 
using random. SystemRandom, which uses os . urandom underneath; os . urandom provides 
accessto more entropy sources. If using this random generator interface, seedO and 
setstate () have no effect; hence these sampies are not reproducibie. 

If we want to have some random words, the easiest way (on Linux) is probabiy to use 
/usr/share/dicts/words. We can see how that is done in the foiiowing exampie: 

import random 


with open('/usr/share/dict/words', 'rt') as f: 

words = f.readlines() 
words = [w.rstripO for w in words] 


for w in random.sample(words, 5): 
print w 

This soiution is for Unix oniy and wiii not work on Windows (but, it wiii work on Mac OS). 

For Windows, you couid use a fiie constructed from various free sources (Project Gutenberg, 
Wiktionary, British Nationai Corpus, or http: //norvig. com/big. txt by Dr Peter Norvig). 


Smoothing the noise in real-world data 


In this recipe, we introduce a few advanced aigorithms to heip with cieaningthe data coming 
from reai-worid sources. These aigorithms are weii known in the signai Processing worid, and 
we wiii not go deep into mathematics but wiii just exempiify how and why they work and for 
what purposes they can be used. 


Getting ready 


Data that comes from different reai-iife sensors usuaiiy is not smooth and ciean and contains 
some noise that we usuaiiy don't want to show on diagrams and piots. We want graphs and 
piots to be ciear and to dispiay information and cost viewers minimai efforts to interpret. 

We don't need any new Software instaiied because we are goingto use some aiready famiiiar 
Python packages: NumPy, SciPy, and matpiotiib. 
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How to do it... 


The basic algorithm is based on usingthe rolling window (for example, convolution). 

This window roiis over the data and is used to compute the average over that window. 

For our discrete data, we use NumPy's convolve function; it returns a discrete iinear 
convoiution of two one-dimensionai sequences. We aiso use NumPy's linspace function, 
which generates a sequence of eveniy spaced numbers for a specified intervai. 

The function ones defines an array or matrix (for exampie, a muitidimensionai array) where 
every eiement has the vaiue i. This heips with generating Windows for use in averaging. 


How it Works... 


One simpie and naive technique to smooth the noise in data we are Processing is to average 
over some window (sampie) and piot just that average vaiue for the given window instead of 
aii the data points. This is the basis for more advanced aigorithms as shown here: 

from pylab import * 
from numpy import * 

def moving_average(intervai, window_size): 

'''Compute convoluted window for given size 

1 t 1 

window = ones(int(window_size)) / float(window_size) 
return convolve(intervai, window, 'same') 

t = linspace(-4, 4, 100) 
y = sin(t) + randn(len(t))*0.1 

plot(t, y, "k.") 

# compute moving average 
y_av = moving_average(y, 10) 
plot(t, y_av,"r") 

#xlim(0,1000) 

xlabel("Time") 
ylabel("Vaiue") 
grid(True) 

Show() 
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Knowing Your Data 


Here, we show how the smoothed line looks compared to the original data points 
(plotted as dots): 



Following on this idea, we can jump ahead to an even more advanced example and use the 
existing SciPy library to make this window smoothing work even better. 

The method we are goingto demonstrate is based on convolution (summation of functions) 
of a scaled window with the signal (that is, data points). This signal is prepared in a elever way, 
adding copies of the same signal on both ends but reflecting it, so we minimize the boundary 
effect. This code is based on SciPy Cookbook's example that can be found here at http: // 
scipy-cookbook.readthedocs.org/. 

import numpy 
from numpy import * 
from pylab import * 

# possible window type 

WINDOWS = ['flat', 'hanning', 'hamming', 'bartlett', 'blackman'] 

# if you want to see just two window type, comment previous line, 

# and uncomment the following one 

# WINDOWS = ['flat', 'hanning'] 
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def smooth(x, window_len=ll, window='hanning'): 

fl It II 

Smooth the data using a window with requested size. 

Returns smoothed signal. 

X -- input signal 

window_len -- length of smoothing window 

window -- type of window: 'flat', 'hanning', 'hamming', 

'bartlett', 'blackman' 

flat window will produce a moving average smoothing. 

tl It II 


if x.ndim 
raise 


!= 1 : 

ValueError, 


"smooth only accepts 1 dimension arrays." 


if x.size 
raise 

size . " 


< window_len: 

ValueError, "Input vector needs to be bigger than window 


if window_len < 3: 
return x 


if not window in WINDOWS: 

raise ValueError ("Window is one of 'flat', 'hanning', 
' hamming', " 


"'bartlett', 'blackman'") 

# adding reflected Windows in front and at the end 
s=numpy.r_[x [window_len-l :0:-1], x, x[-1: -window_len :-1]] 

# pick Windows type and do averaging 
if window == 'flat' : #moving average 

w = numpy.ones(window_len, 'd') 
else: 


# call appropriate function in numpy 
w = eval('numpy.' + window + '(window_len)') 


# NOTE: length(output) != length(input), to correct this: 

# return y[(window_len/2-1):-(window_len/2)] instead of just y. 
y = numpy.convolve(w/w.sum 0 , s, mode='valid') 

return y 


# Get some evenly spaced numbers over a specified interval. 
t = linspace(-4, 4, 100) 
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# Make some noisy sinusoidal 
X = sin(t) 

xn = X + randn(len(t))*0.1 

# Smooth it 

y = smooth(x) 

# window size 
ws = 31 

subplot(211) 
plot(ones(ws)) 

# draw on the same axes 
hold(True) 

# plot for every window 
for w in WINDOWS[1:]: 

eval('plot(' +W+ '(ws) )') 

# configure axis properties 
axis([0, 30, 0, 1.1]) 

# add legend for every window 
legend(WINDOWS) 

title("Smoothing Windows") 

# add second plot 
subplot(212) 

# draw original signal 
plot(x) 

# and signal with added noise 
plot(xn) 

# smooth signal with noise for every possible windowing algorithm 
for w in WINDOWS: 

plot(smooth(xn, 10, w) ) 

# add legend for every graph 

1=['original signal', 'signal with noise'] 
l.extend(WINDOWS) 
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legend(1) 


title("Smoothed signal") 

Show() 

We shouid see the following two plots to see how the windowing algorithm influences the 
noise signal. The top plot represents possible windowing algorithms and the bottom one 
displays every possible resuit from the original signal to the noised up signal and even the 
smoothed signal for every windowing algorithm. Try commenting possible window types and 
leave just one or two to gain better understanding. 




There's more... 


Another very popular signal smoothing algorithm is Median Filter. The main idea of this filter 
is to run through the signal entry by entry, replacing each entry with the median of neighboring 
entries. This idea makes this filter fast and usable for one-dimensional datasets as well as for 
two-dimensional datasets (such as images). 

In the following example, we use the implementation from the SciPy signal toolbox: 


import numpy as np 

import pylab as p 

import scipy.signal as signal 
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Knowing Your Data 


# get some linear data 

X = np.linspace (0, 1, 101) 

# add some noisy signal 
X[3::10] = 1.5 

p.plot(x) 

p.plot(signal.medfilt(x,3)) 
p.plot(signal.medfilt(x,5)) 

p.legend(['original signal', 'length 3','length 5']) 

p.Show () 

We see in the following plot that the bigger the window, the more our signal gets distorted as 
compared to the original but the smoother it looks: 



There are many more ways to smooth data (signais) that you receive from external sources. 

It depends a lot on the area you are working in and the nature of the signal. Many algorithms 
are specialized for a particular signal, and there may not be a general solution for every case 
you encounter. 

There is, however, one important question: "When shouid you not smooth a signal?" One 
common situation where you shouid not smooth signais is prior to statistical procedures, such 
as least-squares curve fitting because ali smoothing algorithms are at least slightiy lousy and 
they change the signal shape. Also, smoothed noise may be mistaken for an actual signal. 





















































Drawing Your 
First Plots and 
Customizing Them 

In this chapter, we will go into a lot more detaii and present most of the possibilities of 
matplotiib. We will cover the following points: 

► Defining plottypes - bar, line, and stacked charts 

► Drawing simple sine and cosine plots 

► Defining axis lengths and limits 

► Defining plot line styles, properties, and format strings 

► Settingticks, labeis, and grids 

► Adding legends and annotations 

► Moving spines to the center 

► Making histograms 

► Making bar charts with error bars 

► Making pie charts count 

► Plotting with filled areas 

► Making stacked plots 

► Drawing scatter plots with colored markers 



Drawing Your First Plots and Customizing Them 


Introduction 


Although we have aiready drawn our first plots using matplotiib, we didn't go into the detaiis 
about how they work, how to set them up, or what the possibilities with using matplotiib are. 
We explore and exercise most common types of data visualizations: line graphs, bar charts, 
histograms, pies, and variationsthereof. 

matplotiib is a powerfui toolbox that satisfies almost all our needs for 2D and some 3D 
plotting needs as well. The best way the authors intend for you to learn matplotiib is through 
examples. When we need to draw a plot, we look for a similar example and try to change it 
to fit our needs. In this way, we are also going to present you with some usefui examples and 
believe that this example will help you find a plot most similar to what you need. 


Defining plot types - bar, line, and stacked 
charts 


In this recipe, we will present different basic plots and what are they used for. Most of the 
plots described here are used daily, and some of them present the basis for understanding 
more advanced concepts in data visualization. 


Getting ready 


We start with some common charts from the matplotiib .pyplot library with just sample 
datasets; we start with basic charting and lay down the foundations of the following recipes. 


How to do it... 


We start by creating a simple plot in IPython. IPython is great because it allows us to interactively 
change plots and see the results immediately. You need to follow these steps for that: 

1. Start IPython by typing the following code at the command prompt: 

$ ipython 

2. Import the necessary functions: 

In [1]: from matplotiib.pyplot import * 

3. Then type the matplotiib plot code: 

In [2] : plot( [1,2,3,2,3,2,2,11 ) 

Out[2]: [<matplotlib.lines.Line2D at 0x412fb50>] 
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The plot shouid open in a new window displayingthe default look of the plot and some 
supporting information as shown here: 


figura 



The basic plot in matplotiib contains the following elements: 

► X and y axes: These are both horizontal and vertical axes. 

► X and y tickers: These are littie tickers denoting the segments of axes. 

There can be major and minor tickers. 

► X and y tick labeis: These represent values on particular axis. 

► Plotting area: This is where the actual plots are drawn. 

You will notice that the values we provided to plot ( ) as y axis values. plot () provides 
default values for the x axis; they are linear values from 0 to 7 (the number of y values -1). 

Now, try adding values for the x axis; as first argument to the plot () function again in the 
same IPython session, type the following script: 

In [2]: plot( [4,3,2,!] , [1,2,3,4] ) 

Out[2]: [<matplotlib.lines.Line2D at 0x31444d0>] 
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Drawing Your First Plots and Customizing Them 



Note how IPython counts input and output lines (In [2] and Out [2]). This 
will help us remember where we are in the current session and enables 
more advanced features such as saving part of the session in a Python 
file. During data analysis, using IPython for prototyping is the fastest way 
to come to a satisfying solution and then save particular sessions into a 
file, to be executed later if you need to reproduce the same plot. 


This will update the plot to look like this image: 



We see here how matplotiib expands the y axis to accommodate the new value range and 
automatically changes color of the second plot line to enable us to distinguish the new plot. 

Uniess we turn off the hold property (by callinghold (False) ), all subsequent plots will 
draw over the same axes. This is the default behavior in pylab mode in IPython, while in 
regular Python Scripts, hold is off by default. 
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Let us pack some more common plots and compare them over the same dataset. You can 
type this in IPython or run itfrom a separate Python script: 

from matplotlib.pyplot import * 

# some simple data 

X = [1,2,3,4] 

y = [5,4,3,2] 

# create new figure 
figure() 

# divide subplots into 2x3 grid 

# and select #1 
subplot(231) 
plot(x, y) 

# select #2 
subplot(232) 
bar(x, y) 

# horizontal bar-charts 
subplot(233) 
barh(x, y) 

# create stacked bar charts 
subplot(234) 

bar(x, y) 

# we need more data for stacked bar charts 
yl = [7,8,5,3] 

bar(x, yl, bottom=y, color = 'r') 

# box plot 
subplot(235) 
boxplot(x) 

# scatter plot 
subplot(236) 
scatter(x,y) 

Show() 
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Drawing Your First Plots and Customizing Them 
This is how it shouid turn out into graphs: 



How it Works... 


With f igure 0 , we create a newfigure. If we suppiy a stringargument such as sample 
charts, it wiii be the backend titie of a window. If we caii the f igure () function with the 
same parameter (that can aiso be a number), we wiii make the corresponding figure active 
and aii the foiiowing piotting wiii be performed on that figure. 

Next, we divide the figure into a 2 x 3 grid using a subplot (231) caii. We couid caii this 
using subplot (2 , 3 , 1) , where the first parameter is the number of rows, the second 
is the number of coiumns, and the third represents the piot number. 

We continue and create a common charting type using simpie caiis to create verticai bar 
charts (bar ()) and horizontai bars (barh ()). For stacked bar charts, we need to tie two 
bar chart caiis together. We do that by connectingthe second bar chart with the previous 
using the parameter bottom = y. 

Box piots are created using the boxplot () caii, where the box extends from iower to 
upper quartiies with the iine at the median vaiue. We wiii return to box piots shortiy. 
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We finally create a scatter plet to give you an idea of a point-based dataset. This is probabiy 
more appropriateiy used when we have thousands of data points in a dataset, but here we 
wanted to iiiustrate the difference in representations of the same dataset. 


There's more... 


We can return to box piots now as we need to expiain the characteristics of this kind of piot. 
A box piot presents, by defauit, the foiiowing eiements: 

► Box: This is a rectangie that covers the interquartiie range 

► Median; This is presented as a iine inside each box 

► Whiskers: These are verticai iines extending to the most extreme vaiues 
(exciuding outiiers) 

► Fliers: These are points beyond the whiskers, which are considered outiiers 

To iiiustrate this behavior, we wiii demonstrate piottingthe same dataset in a box piot and a 
histogram as shown in the foiiowing code: 

from pylab import * 


[113 , 

115, 

119, 

121, 

124 

124, 

125, 

126, 

126, 

126 

127, 

127, 

128, 

129, 

130 

130, 

131, 

132, 

133, 

136 


subplot(121) 

boxplot(dataset, vert=False) 

subplot(122) 
hist(dataset) 

Show() 


{mh 




Drawing Your First Plots and Customizing Them 
That will give us the following plots: 



In the preceding comparison, we can observe a difference In representatlon of the same 
dataset In two different charts. The one on the left points toward the flve mentloned statistical 
values, whlle the one on the right (the histogram) displays the grouping of the dataset In a 
glven range. 


Drawing simple sine and cosine plots 


This recipe wlll go over basies of plotting mathematical functions and several things that are 
related to math graphs such as writlng Greek symbois In labeis and on curves. 


Getting ready 


The most common graph we wlll use Is the line plot command, which draws the glven (x,y) 
coordinates on a figure plot. 
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We start with computing sine and cosine functions over the same linear interval—from Pi to Pi 
with 256 points in between and we piot the vaiues for sin(x) and cos(x) over the same piot as 
shown here: 

import matplotlib.pyplot as pl 
import numpy as np 

X = np.linspace(-np.pi, np.pi, 256, endpoint=True) 

y = np . cos (x) 
yl = np.sin(x) 

pl.piot(x,y) 
pl.plot(x, yl) 

pl.Show() 

That wiii give us the foiiowing graph: 



1.0 
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Drawing Your First Plots and Customizing Them 


Followingthis simple plot, we can customize more to give more information and be more 
precise about axes and boundaries as shown here: 

from pylab import * 
import numpy as np 

# generate uniformly distributed 

# 256 points from -pi to pi, inclusive 

X = np.linspace(-np.pi, np.pi, 256, endpoint=True) 

# these are vectorised versions 

# of math.cos, and math.sin in built-in Python maths 

# compute cos for every x 
y = np.cos (x) 

# compute sin for every x 
yl = np.sin(x) 

# plot cos 
plot(x, y) 

# plot sin 
plot(x, yl) 

# define plot title 

title("Functions $\sin$ and $\cos$") 

# set X limit 
xlim(-3.0, 3.0) 

# set y limit 
ylim(-1.0, 1.0) 

# format ticks at specific values 

xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi], 

[r'$-\pi$', r'$-\pi/2$', r'$0$', r'$+\pi/2$', r'$+\pi$']) 
yticks( [-1, 0, +1] , 

[r'$-l$', r'$0$', r'$+l$']) 


Show () 
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That shouid give us a slightiy nicer graph: 



We see that we used expressions such as $\sin$ or $-\pi$ to write letters of the 
Greek alphabet in figures. This is LaTex syntax, which we wiii expiore further in the foiiowing 
chapters. Here, we just iiiustrated how easy it is to make your math charts more readabie for 
certain audiences. 


Defining axis lengths and limits 


This recipe wiii demonstrate a variety of usefui axis properties around iimits and iengths that 
we can configure in matpiotiib. 


Getting ready 


For this recipe, we want to fire up IPython: 

$ ipython 

Afterthis, we need to importthe piottingfunctions right away: 

from matpiotiib.pylab import * 
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How to do it... 


Start experimenting with various properties of axes. Just callingan empty axis () function 
will return the default values for the axis: 

In [1]: axis() 

Out[l]: (0.0, 1.0, 0.0, 1.0) 

Note that if you are in interactive mode and are using a windowing backend, a figure with an 
empty axis wiii be dispiayed. 

Here the vaiues represent xmin, xmax, ymin, and ymax respectiveiy. Simiiariy, we can set 
vaiues for the x and y axes: 

In [2]: 1 = [-1, 1, -10, 10] 

In [3]: axis(1) 

Out[3]: [-1, 1, -10, 10] 

Again, if you are in an interactive mode, this wiii update the same figure. Furthermore, we can 
aiso update any vaiue separateiy using keyword arguments (**kwargs), settingjust xmax to 
a certain vaiue. 


How it Works... 


If we don't use axis () or other settings, matpiotiib wiii automaticaiiy use minimum vaiues 
that aiiow us to see aii data points on one piot. If we set axis () limits to be less than the 
maximum vaiues in a dataset, matpiotiib wiii do as toid and we wiii not see aii points on the 
figure. This can be a source of confusion or even error, where we think we see everything we 
drew. One way to avoid this is to caii autoscale () (matpiotiib. pyplot. autoscale ()), 
which wiii compute the optimai size of the axes to fit the data to be dispiayed. 

If we wantto add new axes to the same figure, we can use matpiotiib. pyplot .axes (). 
We usuaiiy want to add some properties to this defauit caii; for exampie, rect— which can 
have the attributes lef t, bottom, width, and height in normaiized units (0, 1)—and 
maybe axisbg, which specifies the background eoior of axes. 

There are aiso other properties that we can set for added axes such as sharex/sharey, 
which accepts vaiues for other instances of axes and share the current axis (x/y) with other 
axes. Or parameter polar that defines whether we want to use poiar axes. 

Adding new axes can be usefui; for exampie, to combine muitipie charts on one figure if there 
is a need to tightiy coupie different views on the same data to iiiustrate its properties. 
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If we wantto add just one line to the currentfigure, we can use matplotlib.pyplot. 
axhline () or matplotlib. pyplot. axvline (). The functions axhilne () and 
axvline () will draw horizontal and vertical lines across axes for given x and y data values 
respectively. They share similar parameters, the most important ones being y position, xmin, 
and xmax for axhline () and x position, ymin, and ymax for axvline (). 

Let's see how it looks as a figure, continuing in the same IPython session: 

In [3]: axhline() 

Out[3]: <matplotlib.lines.Line2D at 0x414ecd0> 

In [4]: axvline() 

Out [4] : <itiatplotlib. lines . Line2D at 0x4152490> 


In [5]: axhline(4) 

Out[5]: <matplotlib.lines.Line2D at 0x4152850> 

We shouid have a figure like the following plot: 



Here we see that just calling these functions without parameters makes them take default 
values and draw a horizontal line for y=o (axhline ()) and a vertical line for x=o (axvline ()). 
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Similar to these are two related functions that allow us to add a horizontal span (rectangie) 
across the axes. These are matplotlib.pyplot. axhspan () and matplotlib.pyplot. 
axspan (). The function axhspan () has ymin and ymax as required parameters that detine 
how wide the horizontal span is. Analogous to this, axvspan () has xmin and xmax to detine 
the width otthe vertical span. 


There's more... 


Having a grid in a tigure is turned ott by detault, but it can easily be switched on and customized. 
Adetault call to matplotlib.pyplot .grid () will toggle the grid's visibility. Other parameters 
tor controi are as shown here: 

► which: This detines what grid tick type to draw (can be major, minor, or both) 

► axis: This detines which set ot grid lines are drawn (can be both, x, or y) 

Axes are usually controlled via matplotlib.pyplot .axis (). Internally, axes are 
represented byseveral Python classes, the parent one is matplotlib. axes. Axes, which 
contains most methods to manipulate axes. A singie axis is represented by the matplotlib. 
axis. Axis class, where the x axis uses matplotlib. axis. XAxis and the y axis uses the 
matplotlib. axis . YAxis class. 

We don't need to use these to pertorm our recipe, but it is important to know where to look if 
more advanced axis controi interests us and when we hit the limits of what is available via the 
matplotlib.pyplot namespace. 


Defining plot line styles, properties, and 
format strings 


This recipe shows how we can change various line properties such as styles, colors, or width. 
Having lines set up appropriately accordingto the Information presented and distinet enough 
for target audiences (if the audience is a younger population, we may want to target them with 
more vivid colors; if they are older, we may want to use more contrasting colors) can make the 
difference between being barely noticeable and leaving a great impact on the viewer. 


Getting ready 


Although we stressed how important it is to aesthetically tune your presentation, we first must 
learn how to do it. 

If you don't have a particular eye for color matehing, there are free and commerciai oniine 
tools that can generate color sets for you. One of the most well known is Colorbrewer2, 
which can be found at http : //colorbrewer2 . org/. 
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Some serious research has been conducted on the usage of color in data visualizations, 
but explainingthattheory is out of the scope of this book. The materiai on the topic is a 
must read if you are working with more advanced visualizations daily. 


How to do it... 


Let's learn how to change line properties. We can change the lines in our plots using different 
methods and approaches. 

The first and most common method is to define lines by passing keyword parameters to 
functions such as plot (): 

plot(x, y, linewidth=l.5) 

Because a call to plot () returns the line instance (matplotlib. lines . Line2D), we can 
use a set of setter methods on that instance to set various properties: 

line, = plot(x, y) 
line.set_linewidth(1.5) 

Those who used MATLABO will feei the need to use a third way of configuring line properties 
usingthe setp () function: 

lines = plot(x, y) 
setp(lines, 'linewidth', 1.5) 

Another way to use setp 0 is this: 
setp(lines, linewidth=l.5) 

Whatever way you prefer to configure lines, choose one method and stay consistent for the 
whole project (or at least a file). This way, when you (or someone eise in the future) come 
back to the code, it will be easier to make sense of it and change it. 


How it Works... 


AII the properties we can change for a line are contained in the matplotlib. lines. 
Line2D class. We list some of them in the following table: 


Property 

Value type 

Descriptiori 

alpha 

f loat 

Sets the alpha value used for 
blending; notsupported on ali 
backends. 

color or c 

Any matplotlib color 

Sets the color of the line. 
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dashes 

Sequence of on/off ink 
in points 

Sets the dash sequenee, the 
sequenee of dashes with on/off ink 
in points. If seq is empty or if seq = 
(None, None), linestyle wiii be 
set to solid. 

label 

Any string 

Sets the iabei to s for auto iegend. 

linestyle or Is 

[ 1 1 

1 1 

'steps ' 1 . . . ] 

Sets the iinestyie ofthe iine (aiso 
aeeepts drawstyies). 

linewidth or Iw 

f loat value in points 

Sets the iine width in points. 

marker 

[ 7 1 4 1 5 1 6 1 

'o' 'D' 1 'h' 

'H' 1 '' 

'None' ' ' 1 

None 1 '8' 1 'p' 

1 ',' 1 '+' 1 '.' 

's' 1 '*' 'd' 

1 3 1 0 1 1 1 2 

'1' 1 '3' '4' 

'2' 1 'V' '<' 

' > ' i ' ^ ' ' 1 ' 

'X' 1 '$...$' 1 

tuple 1 Nx2 array 

] 

Sets the iine marker. 

markeredgecolor or 

mec 

Any matpiotiib eoior 

Sets the marker edge eoior. 

markeredgewidth or 

mew 

f loat vaiue in points 

Sets the marker edge width in points. 

markerfacecolor or 

mf c 

Any matpiotiib eoior 

Set the marker faee eoior. 

markersize or ms 

f loat 

Set the markersize in points. 

solid capstyle 

['butt' 1 'round' 

1 'projecting' ] 

Set the eap styie for soiid iine styies. 

solid joinstyle 

['miter' | 

'round' | 

'bevel'] 

Set the join styie for soiid iine styies. 

visible 

[True 1 False] 

Set the artisfs visibiiity. 

xdata 

np.array 

Set the data np . array for x. 

ydata 

np.array 

Set the data np . array for y. 
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Zorder 

Any number 

Set the z axis order for the artist. 

Artists with lower Zorder values are 
drawn first. 



If X and y are axes going horizontal to 
the right and vertical to the top of the 
screen, the z axis is the one extending 
toward the viewer. So 0 value wouid 
be at the screen, 1, one layer above, 
and so on. 


The followingtable shows some linestyles: 


Linestyle 

Descriptiori 

1 _ 1 

Solid 

1_t 

Dashed 

1 _ 1 

Dash_dot 

1 . 1 

Dotted 

'None', ' ', '' 

Draw nothing 


The followingtable shows line markers: 


Marker 

Descriptiori 

'o' 

Circle 

'D' 

Diamond 

'h' 

Hexagoni 

'H' 

Hexagon2 

1 t 

Horizontal line 

'', 'None', ' ', None 

Nothing 

' 8 ' 

Octagon 

'P' 

Pentagon 

1 i 

/ 

PixeI 

' + ' 

Plus 

1 t 

Point 

' s ' 

Square 

1 * t 

Star 

'd' 

Thin_diamond 

' V ' 

Triangle_down 
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Marker 

Descriptiori 

' <' 

1 

Triangle_left 

' >' 

Triangle_right 

I ^ t 

Triangle_up 

' 1 ' 

Vertical line 

'X' 

X 


Color 

We can get all colors that matplotiib supports by callingmatplotlib.pyplot. colors (); 
this will give the following results: 


Aiias 

Coior 

B 

Blue 

G 

Green 

R 

Red 1 

C 

Cyan 

M 

Magenta 

Y 

Yellow 

K 

Black 

W 

White 


These colors can be used in different matplotiib functions that take color arguments. 

If these basic colors are not enough and as we progress, they will not be enough.We can 
use two other ways of defining a color value. We can use an HTML hexadecimal string as 
shown here: 

color = '#eeefff' 

We can also use legal HTML color names (' red', ' chartreuse '). We can also pass an 
RGB tuple normalized to [ 0 , i]: 

color = (0.3, 0.3, 0.4) 

The argument color is accepted by a range of functions such as title (): 

title('Title in a custom color', color='#123456') 







- Chapter 3 

Background color 

By providing axisbg to a function such as matplotlib .pyplot. axes () or matplotlib. 
pyplot. subplot (), we can define the background color of an axis as shown here: 

subplotdll, axisbg= (0.1843, 0.3098, 0.3098)) 


Setting ticks, labeis, and grids 


In this recipe, we will continue with setting axis and line properties and adding more data to 
ourfigure and charts. 


Getting ready 


Let's learn a bitabout figures and subpiots. 

In matplotlib, f igure () is used to explicitiy create a figure, which represents a user interface 
window. Figures are created implicitiy just by calling plot () or similar functions. This is fine 
for simple charts, but having the ability to explicitiy create a figure and get a reference to its 
instance is very usefui for more advanced use. 

A figure contains one or more subpiots. Subpiots allow us to arrange plots in a regular grid. 

We aiready used subplot (), in which we specify the number of rows and columns and the 
number of the plot we are referringto. 

If we want more controi, we need to use axes instances from the matplotlib. axes .Axes 
class. They allow us to place plots at any location in the figure. An example of this wouid be to 
put a smaller plot inside a bigger one. 


How to do it... 


Ticks are part of figures. They consist of tick locators where ticks appear and tick formatters 
which Show how ticks appear. There are major and minor ticks. Minor ticks are not visible by 
default. More importantly, major and minor ticks can be formatted and located independently 
of each other. 

We can use matplotlib. pyplot. locator_params () to controi the behavior of tick 
locators. Even though tick locations are usually determined automatically, we can controi 
the number of ticks and use a tight view if we want to when plots are smaller: 

from pylab import * 


# get current axis 
ax = gea() 
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# set view to tight, and maximum number of tick intervals to 10 
ax.locator_params(tight=True, nbins = 10) 

# generate 100 normal distribution values 

ax.plot(np.random.normal(10, .1, 100)) 

Show() 

This shouid give us the following graph: 



We see how the x and y axes are divided and what values are shown. We couid have achieved 
the same setup using locator classes. Here we are saying "set the major locator to be a 
multiple of 10": 

ax.xaxis.set_maj or_locator(matplotlib.ticker.MultipleLocator(10)) 

TIck formatters can simllarly be specifled. Formatters specify how the values (usually 
numbers) are displayed. For example, matplotlib. ticker. FormatstrFormatter 
simply specifles ' %2 . if ' or ' %i. if cm' as the stringto be used as the label for the ticker. 
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Let's take a look at one example using dates. 

matplotiib represents dates in floating point values as the time in days I 

passed since 0001-01-01 UTC pius 1. So, 0001-01-01 UTC 06:00 is 1.25. I 

Then we can use heiper functions such as matplotiib. dates. date2num(), matplotiib. 
dates . num2date (), and matplotiib. dates . drange () to convert dates between 
different representations. 

Let's see another exampie: 

from pylab import * 
import matplotiib as mpl 
import datetime 

fig = figure() 

# get current axis 
ax = gea () 

# set some daterange 

start = datetime.datetime(2013, 01, 01) 
stop = datetime.datetime(2013, 12, 31) 
delta = datetime.timedelta(days = 1) 

# convert dates for matplotiib 

dates = mpl.dates.drange(start, stop, delta) 

# generate some random values 
values = np.random.rand(len(dates)) 

ax = gea () 

# create plot with dates 

ax.plot_date(dates, values, linestyle='-', marker='') 

# specify formater 

date_format = mpl.dates.DateFormatter('%Y-%m-%d') 

# apply formater 

ax.xaxis.set_maj or_formatter(date_format) 
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# autoformat date labeis 

# rotates labeis by 30 degrees by default 

# use rotate param to specify different rotation degree 

# use bottom param to give more room to date labeis 
fig.autofmt_xdate() 

Show() 

The preceding code will give us the following graph: 



Adding legends and annotations 


Legends and annotations expiain data piots cieariy and in context. By assigning each piot 
a short description about what data it represents, we are enabiing an easier mentai modei 
in the readeds (viewehs) head. This recipe wiii show how to annotate specific points on our 
figures and howto create and position data iegends. 


Getting ready 


How many times have you iooked at a chartand wondered what the data represents? 

More often than not, newspapers and other daiiy and weekiy pubiications create piots that 
don't contain appropriate iegends, thus ieaving the reader free to interpret the representation. 
This creates ambiguity for the readers and increases the possibiiity of error. 
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How to do it... 


Let's demonstrate how to add legends and annotations with the following example: 

from matplotlib.pyplot import * 

# generate different normal distributione 
xl =np.random.normal(30, 3, 100) 

x2 = np.random.normal(20, 2, 100) 
x3 = np.random.normal(10, 3, 100) 

# plot them 
plot(xl, label='plot') 
plot(x2, label='2nd plot') 
plot(x3, label='last plot') 

# generate a legend box 

legend (bbox_to_anchor=(0., 1.02, 1., .102), loc=3, 

ncol=3, mode="expand", borderaxespad=0.) 

# annotate an important value 

annotate ("Important value", (55,20), xycoords='data', 
xytext=(5, 38), 

arrowprops=dict(arrowstyle='->')) 

Show() 

The preceding code will give us the following plot: 
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What we do is assign a string label with every plot, so legend () will try and determino what 
to add in the iegend box. 

We set the iocation of a iegend box by defining the loc parameter. This is optionai, but we 
want to specify a iocation where it is ieast iikeiy for the iegend box to be drawn over piot iines. 
Settingthe iocation vaiueto o is very usefui as itautomaticaiiy detectsthe iocation ofthe 
figure where the iegend can fit with a minimum overiapping with the piot. 


How it Works... 


Aii iocation parameter strings are given in the foiiowing tabie: 


String 

Number vaiue 

best 

0 

upper right 

1 

upper left 

2 

lower left 

3 

lower right 

4 

right 

5 

center left 

6 

center right 

7 

lower center 

8 

upper center 

9 

center 

10 


To not Show the iabei in a iegend, set the iabei to _nolegend_. 

For the iegend, we defined the number of coiumns with ncol = 3 and set the iocation 
with lower lef t. We specified a bounding box (bbox_to_anchor) to start from position 
(0 ., 1.02) and to have a width of i and a height of o. 102. These are normaiized axis 
coordinates. Parameter mode is either None or expand to aiiow the iegend box to expand 
horizontaiiy fiiiing the axis area. The parameter borderaxespad defines the padding 
between the axes and the iegend border. 

For annotations, we have defined a string to be drawn on a piot on a coordinate xy. The 
coordinate system is specified to be the same as the data one; therefore, coordinate system 
is xycoord = ' data '. The starting position for the text is defined by the vaiue of xytext. 

An arrow is drawn from xytext to xy coordinate and the arrowprops dictionary can define 
many properties for that arrow. For this exampie, we used arrowstyle to define arrow styie. 
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Moving spines to the center 


This recipe will demonstrate how to move spines to the center. 

Spines detine data area boundaries; they connect the axis tick marks. There are four spines. 
We can piace them wherever we want; by defauit, they are piaced on the border of the axis, 
hence we see a box around our data piot. 


How to do it... 


To move the spines to the center of the piot, we need to remove two spines, making 
them hidden (set color to none). After that, we move two others to coordinate (0,0). 
The coordinates are specified in data space coordinates. 

The foiiowing code shows how to do this: 

import matplotlib.pyplot as plt 
import numpy as np 

X = np.linspace(-np.pi, np.pi, 500, endpoint=True) 
y = np.sin(x) 

plt.piot(x, y) 

ax = plt.gea 0 

# hide two spines 

ax.spines['right'].set_color('none') 
ax.spines['top'].set_color('none') 

# move bottom and left spine to 0,0 

ax.spines['bottom'].set_position(('data',0)) 
ax.spines['left'].set_position(('data',0)) 

# move ticks positions 

ax.xaxis.set_ticks_position('bottom') 
ax.yaxis.set_ticks_position('left') 

plt.Show() 
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This is what the plot will look like: 



How it Works... 


This code is dependent on the piet that is drawn because we are moving spines to the 
iocation (0, 0) and are piotting a sine function on the intervai where (0, 0) is in the middie 
of the piot. 

Nevertheiess, this demonstrated how to move spines to a particuiar iocation and how to get 
rid of spines we don't want to show. 


There's more... 


Furthermore, spines can be iimited to end where the data ends (for exampie, using a set_ 
smart_bounds (True) caii). In this case, matpiotiib tries to set bounds in a sophisticated 
way (for exampie, to handie inverted iimits or to ciip iine to view if data extends past view). 
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Making histograms 


Histograms are simple; yet it's important to getthe right data into them. We will cover 
histograms in 2D for now. 

Histograms are used to visualize estimations of distribution of data. Generally, we use a few 
terms when speaking of histograms. Vertical rectangles representfrequencies of data points 
within a particular interval called a bin. Bins are created at fixed intervals, so the total area of 
a histogram sums to the number of data points. 

Instead of using absolute values of data, histograms can display relative frequencies of data. 
When this is the case, the total area equals 1. 

Histograms are often used in image manipulation Software as a way to visualize image 
properties such as distribution of light in a particular color channel. Further, these image 
histograms can be used in computer Vision algorithms to detect peaks aiding in edge 
detections, image segmentation, and so on. 

In Chapter 5, Making 3D Visualizations, we have recipes that deal with 3D histograms. 


Getting ready 


The number of bins is the value we want to get right, but it is hard to get them right as there 
are no striet rules on what is the optimal number of bins. There are different theories on 
how to calculate the number of bins, the simplest being the one based on a ceiling function, 
where the number of bins (k) is equal to the ceiling (max(x) - min(x))/h,where x is the dataset 
plotted and h is the desired bin width. This is just one option as the number of bins required 
to display data properly is dependent on real data distribution. 


How to do it... 


We create a histogram callingmatplotlib.pyplot .hist () with a set of parameters. 
Here are some of the most usefui ones: 

► bins: This is either an integer number of bins or a sequence giving the bins. 

The default is lo. 

► range: This is the range of bins and is not used if bins are given as a sequence. 
Outiiers are ignored and the default is None. 

► normed: If the value for this is True, histogram values are normalized and form 
probability density. The default is False. 
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► histtype: This parameter allows us to specify the type of histogram that we want. 
The default value is 'bar' and the other options are shown here: 

□ barstacked: This gives stacked-view histograms for muitipie data 

□ step: This creates a iine piot that is ieft unfiiied 

□ stepf illed: This creates iine piot that is fiiied by defauit 

► align: This centers bars between bin edges. The defauit is mid. Other vaiues are 
lef t and right. 

► color: This specifies the eoior of the histogram. It may be a singie vaiue or have 
a sequence of coiors. If muitipie datasets are specified, the eoior sequence wiii be 
used in the same order. If not specified, a defauit iine eoior sequence is used. 

► orientation: This aiiows the creation of histograms that are horizontai by setting 
orientation tO horizontai. The defauit is vertical. 

The foiiowing code demonstrates how hist () is used: 

import numpy as np 

import matplotlib.pyplot as plt 

mu = 100 
sigma = 15 

X = np. random.no rmal(mu, sigma, 10000) 

ax = plt.gea 0 

# the histogram of the data 
ax.hist(x, bins=35, color='r') 

ax.set_xlabel('Vaiues') 
ax.set_ylabel('Frequency') 

ax.set_title(r'$\mathrm{Histogram:}\ \mu=%d,\ \sigma=%d$' % (mu, 
sigma)) 

plt.Show() 
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This creates a neat, red-colored histogram for our data sample: 



Values 


How it Works... 


We start by generating some normally distributed data. The histogram is piotted with 
the specified number of bins—35—and it is normaiized by setting normed to True (or i); 
we set the color to red (r). 

After that, we set iabeis and a titie for the piot. Here we used the abiiity to write LaTeX 
expressions to write math symbois and mixed that with Python format strings. 


Making bar charts with error bars 


In this recipe, we wiii show how to create bar charts and how to draw error bars. 


Getting ready 


To visuaiize uncertainty of measurement in our dataset or to indicate the error, we can use 
error bars. Error bars can easiiy give an idea of how error free the dataset is. They can show 
one Standard deviation, one Standard error, or 95 percent confidence intervai. There is no 
Standard here, so aiways expiicitiy state what vaiues (errors) error bars dispiay. Most papers 
in the experimentai Sciences shouid contain error bars to present accuracy of the data. 
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How to do it... 


Even though justtwo parameters are mandatory— lef t and height— we often wantto use 
more than that. Here are some parameters we can use: 

► width: This gives the width of the bars. The default value is o. 8. 

► bottom: If bottom is specified, the vaiue is added to the height. The defauit is None. 

► edgecolor: This gives the eoior of the bar edges. 

► ecolor: This specifies the eoior of any error bar. 

► linewidth: This gives width of bar edges; speeiai vaiues are None (use defauits) 
and 0 (when bar edges are not dispiayed). 

► orientation: This has two vaiues vertical and horizontal. 

► xerr and yerr: These are used to generate error bars on the bar ehart. 

Some optionai arguments (color, edgecolor, linewidth, xerr, and yerr) ean be singie 
vaiues or sequenees with the same iength as the number of bars. 


How it Works... 


Let's iiiustrate this usingan exampie: 

import numpy as np 

import matplotlib.pyplot as plt 

# generate number of measurements 
X = np.arange(0, 10, 1) 

# vaiues computed from "measured" 
y = np.log(x) 

# add some error samples from Standard normal distribution 
xe = 0.1 * np.abs(np.random.randn ( len (y))) 

# draw and show errorbar 

plt.bar(x, y, yerr=xe, width=0.4, align='center', ecolor='r', 
color='cyan', label='experiment #1'); 

# give some explanations 
plt.xlabel('# measurement') 
plt.ylabel('Measured vaiues') 
plt.title('Measurements') 
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plt.legend(loc='upper left') 


plt.Show() 

The preceding code will plot the following diagram: 


Measurements 



To be able to plot an error bar, we needed to have some measures (x); for every measure 
computed (y), we introduced errors (xe). 

We used NumPy to generate and compute values; Standard distributions are good enough 
for demonstration purposes, but if you happen to know your data distribution in advance, you 
can always make some prototype visualizations and try out different layouts to find the best 
options to present Information. 

Another interesting option to use if we are preparing visualizations for a black-and-white 
medium is hatch; it can have the following values: 


Hatch value 

Description 

/ 

Diagonal hatching 

\ 

Back diagonal 

1 

Vertical hatching 

- 

Horizontal 

+ 

Crossed 

X 

Crossed diagonal 
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Hatch value 

Descriptiori 

o 

Small circle 

0 

Large circle 


Dot pattern 

★ 

Star pattern 


There's more... 


What we havejust used are error bars known as symmetrical error bars. If the nature of 
our dataset is such that errors are notthe same in both directions (negative and positive), 
we can aiso specify them separateiy using asymmetricai error bars. 

Aii we have to do differentiy is to specify xerr or yerr using a two-eiement iist (such as 
a 2D array), where the first iist contains vaiues for negative errors and the second one for 
positive errors. 


Making pie charts count 


Pie charts are speciai in many ways, the most important being that the dataset they dispiay 
mustsum upto 100 percent or they are just not vaiid. 


Getting ready 


Pie charts represent numericai proportions, where the arc iength of each segment is 
proportionai to the quantity it represents. 

They are compact and can iook very aestheticaiiy pieasing, but they have been criticized as 
they can be hard to compare. Another property of pie charts that does not work in their best 
interest is that pie charts are presented in a specific angie (perspectiva) and segments use 
certain coiors that can skew our perception and infiuence our conciusion about information 
presented. 

What we wiii show here is different ways to use pie charts to present data. 
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How to do it... 


Here, we create a so-called exploded pie chart: 
from pylab import * 

# make a square figure and axes 
figure(l, figsize= (6 , 6) ) 

ax = axes([0.1, 0.1, 0.8, 0.8]) 

# the slices will be ordered 

# and plotted counter-clockwise. 

labeis = 'Spring', 'Summer', 'Autumn', 'Winter' 

# fractions are either x/sum(x) or x if sum(x) <= 1 
X = [15, 30, 45, 10] 

# explode must be len(x) sequence or None 
explode=(0.1, 0.1, 0.1, 0.1) 

pie(x, explode=explode, labels=labels, 
autopct='%1.If%%', startangle=67) 

title('Rainy days by season') 

Show() 

Pie charts iook best if they are inside a square figure and have square axes. 

Fractions of the whoie sum of the pie chart are defined as x/sum (x) orxifsum(x) <= i. 
We get the expiode effect by defining an expiode sequence where each item represents the 
fraction of radius with which to offset each arc. We use the autopct parameter to format the 
iabeis that wiii be drawn inside the ares; they can be a format string or a caiiabie (function). 

We can aiso use a Booiean shadow parameter to add a shadow effect to a pie chart. 

If we don't specify startangle, the fractions wiii be ordered starting counterciockwise from 
the X axis (angie 0). If we specify 90 as the vaiue of startangle, that wiii startthe pie chart 
from the y axis. 
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This is the resulting pie chart: 



Plotting with filled areas 


In this recipe, we wiii show you how to fiii the area under a curve or in between two 
different curves. 


Getting ready 


Library matpiotiib aiiows us to fiii areas in between and under the curves with eoior so that 
we can dispiay the vaiue of that area to the spectator. Sometimes, it is necessary for readers 
(viewers) to comprehend the given speciaiization. 
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How to do it... 


Here's one example of how to fili areas between two contours: 

from matplotlib.pyplot import figure, show, gea 
import numpy as np 

X = np.arange(0.0, 2, 0.01) 

# two different signals are measured 
yl = np.sin(2*np.pi*x) 

y2 = 1.2*np.sin(4*np.pi*x) 

fig = figure() 
ax = gea () 

# plot and 

# fili between yl and y2 where a logieal eondition is met 
ax.plot(x, yl, X, y2, eolor='blaek') 


ax.fill between(x, 
interpolate=True) 

yl, 

y2, 

where=y2 >=yl, 

faeeeolor= 

' darkblue 

ax.fill between(x, 
interpolate=True) 

yl, 

y2, 

where=y2<=yl, 

faeeeolor= 

' deeppink 


ax.set_title('filled between') 
Show () 


How it Works... 


After we have generated random signals for a predefined interval, we plot these two signals 
usinga regularplot () .Then we call f ill_between () with properties that are required 
and mandatory. 

The function f ill_between () is using x as the location from where to pick y values (yl, 
y 2 ) and will then plot the polygon in certain defined colors. 
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We specify a condition to fili the curve with the where parameter, which accepts Boolean 
values (can be expressioris) so that the fili happens oniy when the where condition is met. 



There's more... 


Similar to other functions for plotting, this function also accepts many more parameters 
like hatch (to specify patterns to fili with instead of color) and line options (linewidth 
and linestyle). 

There is also f ill_betweenx (), which enables similar fili features, but it does so between 
horizontal curves. 

The more general function fili () provides the ability to fili any polygon with a color or 
a hatch. 


Making stacked plots 


In this recipe, we will show you how to produce a stacked plot. Stacked plots are used when 
plotting a quantity which can be represented as the sum of several contributions. A stacked 
plot will allow us to represent not oniy the overall trend but also the trend of each individual 
components contributingto the total quantity. 
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Getting ready 


We will consider the world's energy productiori as our total quantity and will represent the 
detailed break down in different energy sourced. We wiii represent the evoiution of energy 
production type from 1973 to 2014. This data iscontained inthefiie ch03-energy- 
production. csv. The data has been taken from http : / /www. eia. gov/totalenergy/ 
data/monthly/ and reshaped for the need of the recipe. 


How to do it... 


Here is the code to produce the stacked piot dispiayed further: 

import pandas as pd 

import matplotlib.pyplot as plt 

# We load the data with pandas. 

df = pd.read_csv('ch03-energy-production.csv') 

# We give names for the columns that we want to load. Different types 
of energy have been ordered by total production values). 

columns = ['Coal', 'Natural Gas (Dry) 'Crude Oil', 'Nuclear Electric 
Power', 

'Biomass Energy', 'Hydroelectric Power', 'Natural Gas Piant Liquids', 
'Wind Energy', 'Geothermal Energy', 'Solar/PV Energy'] 

# We define some specific colors to plot each type of energy produced. 

colors = ['darkslategray', 'powderblue', 'darkmagenta', 'lightgreen', 

'sienna', 

'royalblue', 'mistyrose', 'lavender', 'tornato', 'gold'] 

# Let's create the figure. 
plt.figure(figsize = (12,8)) 

polys = plt.stackplot(df['Year'], df[columns].values.T, colors = 
colors) 

# The legend is not yet supported with stackplot. We will add it 
manually. 

rectangles= [] 
for poly in polys; 

rectangles.append(plt.Rectangle((0, 0), 1, 1, fc=poly.get_facecolor() 
[0] ) ) 

legend = plt.legend(rectangles, columns, loc = 3) 
frame = legend.get_frame() 
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frame.set color('white') 


# We add some Information to the plot. 

plt.title('Primary Energy Production by Source', fontsize = 16) 
plt.xlabel('Year', fontsize = 16) 

plt.ylabel('Production (Quad BTU)fontsize = 16) 
plt.xticks(fontsize = 16) 
plt.yticks(fontsize = 16) 
plt.xlim(1973,2014) 

# Finally we show the figure. 
plt.Show() 

Here is the plot we obtain: 
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At a glance, we can see that the world's energy production is constantly increasing and has 
entered a faster growing phase since 2005. We can also analyze the evolution of each type 
of energy. CoaI production is siowly decreasing while natural gas and crude oil productions 
are increasing. Nuclear production has also started to decrease. At the top of the stacked 
plot (which is more visible by zooming), we can see that renewable energies are stili forming 
a negligible part of the global world's production. The stacked plot was the perfecttool to 
represent this dataset. 
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How it Works... 


The command stackplot () works just like the plot () command but can accept a 
multidimensional array as a second input. This array's first dimension is the number of filled 
areas to plot while the second dimension is the same as the first input array. In our case, the 
shape of df [' Year' ] is (42,), while the shape of df [columns] . values. T is (10,42). 
Note that we use the transpose operator t in order to have the second array in the right 
format, stackplot () creates a list of polygons that we store in the variable polys. 

The legend is not yet supported with stacked plots. We therefore use the commands plt. 
Rectangle () to create the legend's rectangles. Each rectangle's colors is specified using 
poly.get_facecolor () [0] , where poly is an element of the list of polygons created by 
the stackplot 0 command. 

Plotting the legend is then done simply using the command legend () with the rectangles 
as first arguments and the names of each corresponding type of energy source as second 
arguments. The third argument is used to specify the location of the legend. We set the 
background of the legend to white by first using the method get_frame () of the object 
legend, and then setting it's color to white with the frame's set_color () method. 


Drawing scatter plots with colored markers 


If you have two variables and want to spot the correlation between those, a scatter plot may 
be the solution to spot patterns. 

This type of plot is also very usable as a start for more advanced visualization of 
multidimensional data (for example, to plot a scatter plot matrix). 


Getting ready 


Scatter plots display values for two sets of data. The data visualization is done as a collection 
of points not connected by lines. Each of them has its coordinates determined by the value 
of the variables. One variable is controlled (independent variable), while the other variable is 
measured (dependent variable) and is often plotted on the y axis. 
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How to do it... 


Here's a code sample that plots two plots: one with uncorrelated data and the other with 
strong positive correlatlon: 

import matplotlib.pyplot as plt 
import numpy as np 

# generate x values 

X = np.random.randn(1000) 

# random measurements, no correlation 
yl = np.random.randn(len(x)) 

# strong correlation 
y2 = 1.2 + np.exp(x) 

axi = plt.subplot(121) 

plt.scatter(x, yl, color='indigo', alpha=0.3, edgecolors='white', 

label='no correi') 

plt.xlabel('no correlation') 

plt.grid(True) 

plt.legend() 

ax2 = plt.subplot(122, sharey=axl, sharex=axl) 

plt.scatter(x, y2, color='green', alpha=0.3, edgecolors='grey', 
label='correi') 

plt.xlabel('strong correlation') 
plt.grid(True) 
plt.legend() 

plt.Show() 

Here, we also use more parameters such as color for settingthe color of the plot, marker for 
using as a point marker (the default Is circle), alpha (alpha transparency), edgecolors 
(color of the marker edge), and label (for legend box). 
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These are the plots we get: 




How it Works... 


A scatter plot is often used to identify potentiai association between two variables, and it's 
often drawn before working on a fitting regression function. It gives a good visual picture of 
the correiation, particuiariy for noniinear reiationships. matpiotiib provides the scatter () 
function to piot x versus y-unidimensionai array of the same iength as a scatter piot. 
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More Plots and 
Customizations 


In this chapter we will learn about: 

► Settingthe transparency and size of axis labeis 

► Adding a shadow to the chart line 

► Adding a data table to the figure 

► Usingsubpiots 

► Customizinggrids 

► Creating contour plots 

► Fillingan under-plotarea 

► Drawing polar plots 

► Visualizing the filesystem tree usinga polar bar 

► Customizing matplotiib with style 


Introduction 


In this chapter, we will explore more advanced properties of the matplotiib library. We are going 
to introduco more options and will look at how to achieve certain visually pleasing results. 


During this chapter, we will seek the Solutions to some non-triviai problems with representing 
data when simple charts are not enough. We will try to use more than one type of graph or 
create hybrid graphs to cover advanced data structuros and the representation required. 
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Setting the transparency and size of 
axis labeis 


The Axes label describes what the data in the figure represents and is quite important for the 
viewehs understanding of the figure itseif. By providing iabeis to the axes background, we heip 
the viewer comprehend the information in an appropriate way. 


Getting ready 


Before we dive into the code, it is important to understand how matpiotiib organizes our figures. 

At the top ievei, there is a Figure instance containing aii that we see and some more (that 
we don't see). The figure contains, among other things, instances of the Axes ciass as a 
Figure. axes fieid. The Axes instances contain aimost everything we care about: aii the 
iines, points, ticks, and iabeis. So, when we caii plot (), we are adding a iine (matpiotiib. 
lines . Line2D) to the Axes. lines iist. If we piot a histogram (hist ()), we are adding 
rectangies to the iist of Axes . patches ("patches" is the term inherited from MATLAB* 
and it represents the "patch of eoior" concept). 

An instance of Axes aiso hoids references to the XAxis and YAxis instances, which in turn 
refer to the X axis and y axis, respectiveiy. ThexAxis and YAxis instances managethe drawing 
of the axis, iabeis, ticks, tick iabeis, iocators, and formatters. We can reference these through 
Axes .xaxis and Axes .yaxis, respectiveiy. We don't have to go aii the way down to XAxis 
or YAxis instances to get to the iabeis as matpiotiib gives us a heiper method (practicaiiy 
a shorteut) that enabies iterations via these iabeis: matplotlib.pyplot .xlabel () and 
matpiotiib.pyplot.ylabel(). 


How to do it... 


We wiii now create a new figure, in which we wiii: 

1. Create a piot with some random generated data. 

2. Add the title and axes iabeis. 

3. Add aipha settings. 

4. Add shadow effects to the titie and axes iabeis. 
import matplotlib.pyplot as plt 
from matpiotiib import patheffects 
import numpy as np 

data = np.random.randn(70) 
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fontsize = 18 
plt.plot(data) 

title = "This is figure title" 
x_label = "This is x axis label" 
y_label = "This is y axis label" 

title_text_obj = plt.title (title, fontsize = fontsize, 
verticalalignment='bottom') 

title_text_obj.set_path_effects([patheffects. 
withSimplePatchShadow()]) 

# offset_xy -- set the 'angle' of the shadow 

# shadow_rgbFace -- set the color of the shadow 

# patch_alpha -- setup the transparency of the shadow 

offset_xy = (1, -1) 

rgbRed = (1.0,0.0,0.0) 
alpha = 0.8 

# customize shadow properties 

pe = patheffects.withSimplePatchShadow(offset_xy = offset_xy, 
shadow_rgbFace = rgbRed, 
patch_alpha = alpha) 

# apply them to the xaxis and yaxis labeis 

xlabel_obj = plt.xlabel(x_label, fontsize=fontsize, alpha=0.5) 
xlabel_obj.set_path_effects([pe] ) 

ylabel_obj = plt.ylabel(y_label, fontsize=fontsize, alpha=0.5) 
ylabel_obj.set_path_effects([pe]) 

plt.Show() 


How it Works... 


We aiready know all the familiar imports, parts that generate data, and basic plotting 
techniques, so we will skip those. If you are not able to decipher the first few lines of the 
example, please refer to Chapter 2, Knowing Your Data, and Chapter 3, Drawing Your 
First Plots and Customizing Them, where these concepts are aiready explained. 

After we have plotted the dataset, we are ready to add tities and labeis, and to customize 
their appearance. 
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First, we add the titie. Then, we detine the font size and verticai aiignment of the titie 
text to be bottom aiigned. The defauit shadow effect is added to the titie if we are using 
matplotlib. pathef f ects . withSimplePatchShadow ( ) with no parameters. The 
defauit vaiues for the parameters are: of fset_xy= ( 2 ,- 2 ), shadow_rgbFace=None, and 
patch_alpha=o . 7. The other vaiues are center, top, and baseline, but we choose 
bottom as the text wiii have some shadowing. In the next iine, we add the shadow effect. The 
path effectsare part ofthe matpiotiib moduie matplotlib. pathef f ects thatsupports 
matplotlib.text.Text and matplotlib.patches.Patch. 

We now want to add different settings ofthe shadow to both the x and y axes. First, we 
customize the position (offset) of the shadow to the parent object, and then we set the eoior 
ofthe shadow. The eoior is here represented in tripies (3-tupie) offioat vaiues between 0.0 
and 1.0, for each of the RGB channeis. For exampie, our red eoior is represented as (i. 0 , 
0 . 0 , 0 . 0 ) (aii red, no green and no biue). 

The transparency (or aipha) is set up as a normaiized vaiue, and we aiso want to set this up 
here to be different from the defauit. 

With aii the settings present, we instantiate matplotlib. pathef f ects. 
withSimplePatchShadow and hoid the reference to it in the variabie pe to reuse it few 
iines iater. 

To be abie to appiy the shadow effect, we need to get to the label object. This is simpie 
enough because matplotlib.pyplot .xlabel () returns a reference to the object 
(matplotlib. text. Text) that we then use to caii set_path_ef f ects ( [pe] ). 

We finaiiy show the piot and can feei proud of our work. 


There's more... 


If you are not satisfied with the effects that matplotlib. pathef f ects currentiy offers, 
you can inherit the matplotlib .pathef f ects ._Base ciass and overridethe draw_path 
method. Take a iook at the code and comments on how to do this here: 

https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/ 
patheffects.py#L47 


Adding a shadow to the chart line 


To be abie to distinguish one particuiar piot iine in the figure or just to fit in the overaii styie 
of the output our figure is in, we sometimes need to add a shadow effect to the chart iine (or 
histogram, for that matter). In this recipe, you wiii be iearning how to add a shadow effect to 
the piofs chart iines. 
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Getting ready 


To add shadows to the lines or rectangles in our charts, we need to use the transformation 
framework buiit in matplotiib and located in matplotlib. transforms. 

To understand how itall works, we need to explain whattransformations are available in 
matplotlib and how they work. 

Transformations know how to convert the given coordinates from their coordinate system 
into display. They also know how to convert them from display coordinates into their own 
coordinate system. 

The followingtable summarizes the existing coordinate systems and whatthey represent: 


Coordinate system 

Transformation object 

Description 

Data 

Axes.transData 

Represents the user's data coordinate 
system. 

Axes 

Axes.transAxes 

Represents the Axes coordinate system, 
where (0,0) represents the bottom-left 
end of the axes and (1,1) represents the 
upper-right end ofthe axes. 

Figure 

Figure.transFigure 

This is the Figure coordinate system, 
where (0,0) represents the bottom-left 
end of the figure and (1,1) represents the 
upper-right end of the figure. 

Display 

None 

Represents the pixei coordinate 
system of the user display, where (0,0) 
represents the bottom-left of the display, 
and tuple (width, height) represents the 
upper-right of the display, where width 
and height are in pixeis. 


Note how the display does not have a value in the column. This is because the default 
coordinate system is Display, so coordinates are always in pixeis relative to your display 
coordinate systems. This is not very usefui, and most often we wantthem normalized into 
Figure or Axes or a Data coordinate system. 

This framework enables us to transform the current object into an offset object, that is, to 
place that object shifted a certain distance from the original object. 

We will use this framework to create our desired effect on the plotted sine wave. 
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How to do it... 


Here is the cede recipe to add shadowing to the piotted chart. The code is expiained in the 
section that foiiows. 

import numpy as np 

import matplotlib.pyplot as plt 

import matplotlib.transforms as transforms 

def Setup(layout=None): 

assert layout is not None 

fig = plt.figureO 

ax = fig.add_subplot(layout) 

return fig, ax 

def get_signal(): 

t = np.arange(0., 2.5, 0.01) 
s = np.sin(5 * np.pi * t) 
return t, s 

def plot_signal(t, s): 

line, = axes.plot(t, s, linewidth=5, color='magenta') 
return line 

def make_shadow(fig, axes, line, t, s): 

delta = 2 / 72. # how many points to move the shadow 

offset = transforms.ScaledTranslation(delta, -delta, fig.dpi_ 
scale_trans) 

offset_transform = axes.transData + offset 

# We plot the same data, but now using offset transform 

# zorder -- to render it below the line 
axes.plot(t, s, linewidth=5, color='gray', 

transform=offset_transform, 
zorder=0.5 * line.get_zorder()) 

if _name_ == "_main_" ; 

fig, axes = setup(lll) 
t, s = get_signal() 
line = plot_signal(t, s) 

make_shadow(fig, axes, line, t, s) 

axes.set_title('Shadow effect using an offset transform') 
plt.Show() 
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How it Works... 


We start reading the code from the bottom, after the if _name_ check. First, we create 

the figure and axes in setup (); after that, we obtain a signai (or generate data—sine wave). 
We piet the basic signai in plot_signal ( ). Then, we make the shadow transformation and 
piot the shadow in make_shadow (). 

We use the offset effect to create an offset object underneath and just a few points away from 
the originai object. 

The originai object is a simpie sine wave that we piot using the Standard function plot (). 

To add to this offset transformation, matpiotiib contains heiper transformation— 
matplotlib.transforma.ScaledTranslation. 

The vaiues for dx and dy are defined in points, and as the point is 1/72 inches, we move the 
offset object 2 pt right and 2pt down. 

If you want to learn more about how we converted the point 
to 1/72 inches, read more in this Wikipedia article: http : // 
en.wikipedia.org/wiki/Point_%2 8typography%2 9. 

We can use matplotlib.transforms.ScaledTransformation(xtr, ytr, 
scaletr) ; here, xtr and ytr are transiation offsets and scaletr is a transformation 
caiiabie to scaie xtr and ytr at transformation time and before dispiay. The most common 
use case for this is transforming from points to dispiay space—for exampie—to DPI so that the 
offset always stays at the same place no matter what the actual output—be it the monitor or 
printed material. The caiiabie we use for this is aiready buiit in, and is available at Figure. 
dpi_scale_trans. 

We then piot the same data with the applied transformation. 



There's more... 


Using transforms to add shadows is just one and not the most popular use case of this 
framework. To be able to do more with the transformation framework, you will need to learn 
the detaiis of how the transformation pipeline works and what the extension points are (what 
classes to inherit and how). This easy enough because matplotlib is open source, and even if 
some code is not well documented, there is a source you can read from and use or change, 
thus contributing to the overall quality and usefulness of matplotlib. 
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Adding a data table to the figure 


Although matplotiib is mainiy a plotting library, it heips us with smaii errands when we are 
creating a chart, such as having a neat data tabie beside our beautifui chart. In this recipe, 
you wiii be iearning how to dispiay a data tabie aiongside the piots in the figure. 


Getting ready 


It is important to understand why we are adding a tabie to a chart. The main intention of piotting 
data visuaiiy is to expiain the otherwise not understandabie (or hardiy understandabie) data 
vaiues. Now, we want to add that data back. It is not wise justto eram a big tabie with vaiues 
underneath the chart. 

But, carefuiiy picked, maybe the summed or highiighted vaiues from the whoie, a charted 
dataset can identify important parts of the chart or emphasize the important vaiues for those 
piaces where the exact vaiue (for exampie, yeariy saies in USD) is important (or even required). 


How to do it... 


Here's the code to add a sampie tabie to our figure: 

import matplotiib.pyplot as plt 
import numpy as np 

plt.figure() 

ax = plt.gea 0 

y = np.random.randn(9) 

col_labels = ['coli', 'col2', 'col3'] 
row_labels = ['rowl','row2','row3'] 

table_vals = [[11, 12, 13], [21, 22, 23], [28, 29, 30]] 

row_colors = ['red', 'gold', 'green'] 

my_table = plt.tabie(cellText=table_vals, 

colWidths=[0.1] * 3, 

rowLabels=row_labels, 

colLabels=col_labels, 

rowColours=row_colors, 

loc='upper right') 

plt.plot(y) 
plt.Show() 
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The previous code snippet gives a plot such as the following: 



012345678 


How it Works... 


Using plt . table () , we create a table of cells and add it to the current axes. The table can 
have (optional) row and column headers. Each table cell contains either patch or text. The 
column widths and row heights for the table can be specified. The return value is a sequence 
of objects (text, line, and patch instances) thatthe table is made of. 

The basic function signature is: 

table(cellText=None, cellColours=None, 
cellLoc='right', colWidths=None, 
rowLabels=None, rowColours=None, rowLoc='left', 
colLabels=None, colColours=None, colLoc='center', 
loc='bottom', bbox=None) 

The function instantiates and returns the matplotlib. table. Table instance. This is 
usually the case with matplotlib; there's just one way to add the table to the figure. The 
Object-Oriented interface can be directiy accessed. We can use the matplotlib. table. 
Table class directiy to fine-tune our table before we add it onto our axes instance with 
add table(). 


jl19| — 
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There's more... 


You can have more controi if you directiy create an instance of matplotlib. table .Table 
and configure it before you add it to the axes instance. You can add the table instance 
to axes using Axes . add_table (table) , where table is an instance of matplotlib. 
table.Table. 


Using subpiots 


If you are reading this book from the beginning, you are probabiy famiiiar with the subplot 
ciass, a descendant of axes that iives on the reguiar grid of subplot instances. We are 
goingto expiain and demonstrate how to use subpiots in advanced ways. 

In this recipe, you wiii be iearning how to create custom subpiot configuratione on our piots. 


Getting ready 


The base ciass for subpiots is matplotlib. axes . SubplotBase. These subpiots are 
matplotlib. axes - Axes instances, but provide heiper methods for generating and 
manipuiating a set of Axes within a figure. 

There is a ciass matplotlib. figure. SubplotParams, which hoids aii the parameters for 
subplot. The dimensione are normaiized to the width or height of the figure. As we aiready 
know, if we don't specify any custom vaiues, they wiii be read from the rc parameters. 

The scripting iayer (matplotlib.pyplot) hoids a few heiper methods to manipuiate 
subpiots. 

matplotlib.pyplot. subpiots is used for the easy creation of common iayouts of 
subpiots. We can specify the size of the grid—the number of rows and coiumns of the 
subpiot grid. 

We can create subpiots that share the X or Yaxes. This is achieved using sharex or the 
sharey keyword argument. The sharex argument can have the True vaiue, in which case 
the X axis is shared among aii the subpiots. The tick iabeis wiii be invisibie on aii but the iast 
row of piots. They can aiso be defined as String, with enumerated vaiues of row, coi, all, or 
none. The all vaiue is the same as True, and the vaiue none is the same as False. If the 
vaiue row is specified, each subpiot row shares the X axis. If the vaiue coi is specified, each 
subpiot coiumn shares the X axis. This heiper returns tupie f ig, ax, where ax is either an 
axis instance or, if more than one subpiot is created, an array of axis instances. 
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matplotlib.pyplot. subplots_adjust is used to tune the subpiot layout. The 
keyword arguments specify the coordinates of the subpiots inside the figure (left, 
right, bottom, and top) normaiized to figure size. White space can be specified to 
be ieft between the subpiots using the wspace and hspace arguments for width and 
heightamounts, respectiveiy. 


How to do it... 


1. We wiii Show you an exampie of using yet another heiper function in the matpiotiib 
tooikit— subplot2grid. We define the grid's geometry and the subpiot iocation. Note 
that this iocation is 0-based not 1-based as we are used to in plot. subpiot (). We 
can aiso use colspan and rowspan to aiiow the subpiot to span muitipie coiumns 
and rows in a given grid. For exampie, we wiii create a figure, add various subpiot 
iayouts using subplot2grid, and reconfigure the tick iabei size. 

2. Show the piot: 

import matplotlib.pyplot as plt 
plt.figure(0) 

axesl = plt.subplotZgrid((3, 3), (0, 0), colspan=3) 

axes2 = plt.subplotZgrid((3, 3), (1, 0), colspan=2) 

axes3 = plt.subplot2grid((3, 3), (1, 2)) 

axes4 = plt.subplot2grid((3, 3), (2, 0)) 

axesS = plt.subplot2grid((3, 3), (2, 1), colspan=2) 

# tidy up tick labeis size 
all_axes = plt.gcf().axes 
for ax in all_axes: 

forticklabel in ax.get_xticklabels() + ax.get_yticklabels(): 
ticklabel.set fontsize(lO) 



plt.suptitle("Demo of subplot2grid") 
plt.Show() 
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When we execute the previous code, the following plot is created 



How it Works... 


We provide subplot2grid with a shape, iocation (loc), and optionaiiy, rowspan and 
colspan. The important difference here is that the iocation is indexed from 0, and not 
from 1, as in f igure . add_subplot. 
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There's more... 


To give an example of another way, you can customize the current axes or subplot: 

axes = fig.add_subplot(111) 
rectangle = axes.patch 
rectangle.set_facecolor('blue') 

Here we see that every axes instance contains a field patch referencingthe rectangle 
instance, thus representing the background of the current axes instance. This instance has 
properties that we can update, hence updating the current axes background. We can change 
its eoior, but we can aiso ioad an image to add a watermark protection, for exampie. 

It is aiso possibie to create a patch first and then just add it to the axes background: 

fig = plt.figureO 

axes = fig.add_subplot(111) 

rect = matplotlib.patehes.Rectangle((1,1), width=6, height=12) 
axes.add_patch(rect) 

# we have to manually force a figure draw 
axes.figure.canvas.draw() 


Customizing grids 


A grid is usuaiiy handy to have under iines and charts as it heips the human eye spot 
differences in patterns and compare piots visuaiiy in the figure. To be abie to set up how 
visibiy, how frequentiy, and in what styie the grid is dispiayed—or whether it is dispiayed 
at aii—we shouid use matplotlib.pyplot. grid. 

In this recipe, you wiii be iearning how to turn the grid on and off and how to change the 
major and minor ticks on a grid. 
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Getting ready 


The most frequent grid customization is reachable in the matplotlib. pyplot. grid 
heiper function. 

To see the interactive effect of this, you shouid run the foiiowing under ipython. The basic 
caii to plt. grid () wiii toggie the grid visibiiity in the current interactive session started by 
the iast iPythonPyLab environment: 

In [1]: plt.plot( [1,2,3,3.5,4,4.3,3] ) 

Out [1] : [<matplotlib.lines.Line2D at 0x3dcc810>] 



Now, we can toggie the grid on the same figure: 

In [2]: plt.grid() 
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We turn the grid back on, as shown in the following plot: 



We then turn it off again: 
In [3]: plt.grid() 
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Apart from just turning it on and off, we can further customize the grid's appearance. 

We can manipulate the grid with just major ticks, or just minor ticks, or both; hence, the vaiue 
of function argument which can be ' ma j or' , ' minor' , or ' both' . Simiiariy, we can controi 
the horizontai and verticai ticks separateiy usingthe argument axis that can have vaiues 
'X', ' y ', or ' both' . 

Aii the other properties are passed via kwargs and represent a Standard set of properties 
that a matplotlib. lines. Line2D instance can accept, such as color, linestyle, and 
linewidth; here is an exampie: 

ax.grid(color='g', linestyle='--', linewidth=l) 


How to do it... 


This is nice, but we want to be abie to customize more. In order to do that, we need to reach 
deeper into matpiotiib and into mpl_toolkits and find the AxesGrid moduie that aiiows 
us to make grids of axes in an easy and manageabie way: 

import numpy as np 

import matplotlib.pyplot as plt 

from mpl_toolkits.axes_gridl import ImageGrid 

from matplotlib.cbook import get_sample_data 

def get_demo_image(): 

f = get_sample_data("axes_grid/bivariate_normal.npy", 
asfileobj =False) 

# z is a numpy array of 15x15 

Z = np.load(f) 

return Z, (-3, 4, -4, 3) 

def get_grid(fig=None, layout=None, nrows_ncols=None): 
assert fig is not None 
assert layout is not None 
assert nrows_ncols is not None 

grid = ImageGrid(fig, layout, nrows_ncols=nrows_ncols, 

axes_pad=0.05, add_all=True, label_mode="L") 

return grid 

def load_images_to_grid(grid, Z, *images): 
min, max = Z.minO, Z.maxO 
for i, image in enumerate(images): 
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axes = grid[i] 

axes.imshow(image, origin="lower", vmin=min, vmax=max, 
interpolation="nearest") 

if _name_ == "_main_" : 

fig = plt.figure (1, (8, 6)) 

grid = get_grid(fig, 111, (1, 3)) 

Z, extent = get_demo_image() 

# Slice image 
image1 = Z 
image2 = Z [:, :10] 

imageS = Z [;, 10 : ] 

load_images_to_grid(grid, Z, imagel, image2, imageS) 

plt.draw() 
plt.Show() 

The given code will render the following plot: 
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How it Works... 


In the get_demo_image function, we loaded data from the sample data directory that comes 
with matplotiib. 

The list grid holds our axes grid (in this case, imageGrid). 

The variables imagei, image 2 , and image3 hold sliced data from Z that we have split over 
multiple axes in the list grid. 

Looping over ali the grids, we are plotting data from imi, im2, and im3 using the Standard 
imshow () call, while matplotiib takes care that everything is neatly rendered and aligned. 


Creating contour plots 


A contour plot displays the isolines of a matrix. Isolines are curves where a function of two 
variables has the same value. 

In this recipe, you will learn howto create contour plots. 


Getting ready 


Contours are represented as contour plots of the matrix z, where z is interpreted as height 
with respect to the XY plane, z is of minimum size 2 and must contain at least two 
different values. 

The problem with contour plots is that if they are coded without labelingthe isolines, 
they are rendered pretty useless as we cannot decode the high points from the low points or 
find local minimas. 

Here, we need to label the contour as well. The labeling of isolines can be done by using 
either labeis (clabel ()) or colormaps. If your output medium permits the use of color, 
colormaps are preferred because viewers will be able to decode data more easily. 

The other risk with contour plots is in choosing the number of isolines to plot. If we choose 
too many, the plot becomes too dense to decode, and if we go with too few isolines, we lose 
Information and can perceive data differently. 

The contour () function will automatically guess how many isolines to plot, but we also have 
the ability to specify our own number. 

In matplotiib, we draw contour plots using matplotiib. pyplot. contour. 

There are two similar functions: contour () draws contour lines, and contourf () draws 
filled contours. We are going to demonstrate oniy contour (), but almost everything is 
applicable to contourf (). They understand almost the same arguments as well. 
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The contour () function can have different call signatures, depending on what data we have 
and/or what the properties that we want to visualize are. 


Call signature 

Description 

contour(Z) 

Plots the contour of z (array). The level values are chosen 
automatically. 

contour(X,Y,Z) 

Plots the contour of X, Y, and z. The arrays x and Y are (x, y) 
surface coordinates. 

contour(Z,N) 

contour(X,Y,Z,N) 

Plots the contour of z, where the number of leveis is defined 
by N. The level values are automatically chosen. 

contour(Z,V) 
contour(X,Y,Z,V) 

Plots the contour lines with leveis at the values specified in v. 

contourf(..., V) 

Filis the len (V) - 1 regions between the level values in 
sequence v. 

contour (Z, **k;wargs) 

Uses keyword arguments to controi common line properties 
(colors, line width, origin, color map, and so on). 


There exist certain constraints on the dimensionality and shape of x, y, and z. For example, 
X and Y can be of two dimensions and of the same shape as z. If they are of one dimension, 
such that the length of x is equai to the number of coiumns in z, then the iength of Y wiii be 
equai to the number of rows in z. 


How to do it... 


In the foiiowing code exampie, we wiii: 

1. Impiement a function to act as a mock signai processor. 

2. Generate some iinear signai data. 

3. Transform the data into suitabie matrices for use in matrix operations. 

4. Piot contour iines. 

5. Add contour iine iabeis. 

6. Show the piot. 

import numpy as np 
import matplotlib as mpl 
import matplotlib.pyplot as plt 
defprocess_signals(x,y): 

return (1- (x**2+y**2)) * np.exp(-y ** 3 / 3) 
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X = np.arange(-1.5, 1.5, 0.1) 
y = np.arange(-1.5, 1.5, 0.1) 

# Make grids of points 
X,Y = np.meshgrid(x, y) 

Z = process_signals(X, Y) 

# Number of isolines 

N = np.arange(-1, 1.5, 0.3) 

# adding the Contour lines with labeis 

CS = plt.contour(Z, N, linewidths=2, cmap=mpl.cm.jet) 
plt.clabel(CS, inline=True, fmt='%l.lf', fontsize=10) 
plt.colorbar(CS) 

plt.title('My function: $z=(l-x^2+y^2) e^{-(y^3)/3}$') 
plt.Show() 

This will give us the following chart: 


My function: z = (l-x^ +y')e 
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How it Works... 


We reached for littie helpers from numpy to create our ranges and matrices. 

After we evaluated my_function into z, we simply called contour, providing z and the 
number of leveis for isolines. 

Atthis point, try experimenting with the third parameter in the N arange () caii. For exampie, 
insteadofN = np.arange (- 1 , 1.5, 0.3 ) , try changing 0 . 3 to 0 . 1 or 1 to experience how 
the same data is seen differentiy, depending on how we encode the data in a contour piot. 

We aiso added a eoior map by simpiy giving it cs (a matplotlib. contour. 
QuadContourSet instance). 


Filling an under-plot area 


The basic way to draw a fiiied poiygon in matpiotiib is to use matplotlib. pyplot .fili. 
This function accepts simiiar arguments as matplotlib. pyplot. plot— muitipie x and y 
pairs and other Line2D properties. This function returns the iist of pateh instances that 
were added. 

In this recipe, you wiii iearn how to shade certain areas of piot intersections. 


Getting ready 


matpiotiib provides severai functions to heip us piot fiiied figures, apart from piotting functions 
that are inherentiy piotting ciosed fiiied poiygons, such as histogram (), of course. 

We aiready mentioned one— matplotlib .pyplot. f ill— but there are the matplotlib. 
pyplot. f ill_between () and matplotlib .pyplot. f ill_betweenx () functions too. 
These functions fiii the poiygons between two curves. The main difference between f iii_ 
between () and f ill_betweenx () is that the iatter fiiis between the x axis vaiues, whereas 
the former fiiis between the y axis vaiues. 

The f ill_between function accepts argument x— an x axis array of data—and yi and y 2 — 
the y axis arrays of the data. Using arguments, we can specify conditions under which the 
area wiii be fiiied. This condition is the Booiean condition, usuaiiy specifying the y axis vaiue 
ranges. The defauit vaiue is None— meaning, to fiii everywhere. 
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How to do it... 


To start off with a simple example, we will fili the area under a simple function: 

import numpy as np 

import matplotlib.pyplot as plt 

from math import sqrt 

t = range(1000) 
y = [sqrt(i) for i in t] 
plt.plot(t, y, color='red', lw=2) 
plt.fill_between(t, y, color='silver') 
plt.Show() 

The preceding code gives us the following plot: 



This is fairiy straightforward and gives an idea of how f ill_between () works. Note how we 
needed to plot the actual function line (usingplot (), of course), where fill_between() 
just draws a polygonal area filled with color ("silver"). 

We will demonstrate another recipe here. It will involve more conditioning for the fili function. 
The following is the code for the example: 

import matplotlib.pyplot as plt 
import numpy as np 








X = np.arange(0.0, 2, 0.01) 

yl = np.sin(np.pi*x) 

y2 = 1.7*np.sin(4*np.pi*x) 
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fig = plt.figureO 

axesl = fig.add_subplot(211) 

axesl.plot(x, yl, x, y2, color='grey') 

axesl.fill_between(x, yl, y2, where=y2<=yl, facecolor='blue', 
interpolate=True) 

axesl.fill_between(x, yl, y2, where=y2>=yl, facecolor='gold', 
interpolate=True) 

axesl.set_title('Blue where y2<= yl. Gold-color where y2>= yl.') 
axesl.set_ylim(-2,2) 

# Mask values in y2 with value greater than 1.0 
y2 = np.ma.masked_greater(y2, 1.0) 
axes2 = fig.add_subplot(212, sharex=axesl) 
axes2.plot(x, yl, x, y2, color='black') 

axes2.fill_between(x, yl, y2, where=y2<=yl, facecolor='blue', 
interpolate=True) 

axes2.fill_between(x, yl, y2, where=y2>=yl, facecolor='gold', 
interpolate=True) 

axes2.set_title('Same as above, but mask') 
axes2.set_ylim(-2,2) 
axes2.grid('on') 

plt.Show() 

The preceding code will render the following plot: 
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How it Works... 


For this example, we first created two sinusoidal functions that overlap at certain points. 

We also created two subpiots to compare the two variatioris that render filled regions. 

In both cases, we used f ill_between () with an argument, where, that accepts an 
/V-length Boolean array and will fili regions where the value equals True. 

The bottom subpiot illustrates mask_greater, which masks an array at values greater than 
a given value. This is a function from the numpy. ma package to handie missing or invalid 
values. We turned the grid on the bottom axes to make it easier to spot this. 


Drawing polar plots 


If the data is aiready represented using polar coordinates, we can also display it using polar 
figures. Even if the data is not in polar coordinates, we shouid consider converting itto polar 
form and draw on polar plots. 

To decide whether we want to do this, we need to understand what the data represents and 
what we are hoping to display to the end user. Imagining what the user will read and decode 
from our figures usually leads us to the best visualizations. 

Polar plots are commonly used to display Information that is radiai in nature. For example, in 
sun path diagrams—we see the sky in radiaI projection and the radiation maps of antennas 
radiate differently at different angles. You can learn more about this at http: //www. 
astronwireless.com/topic-archives-antenna-radiation-patterns.asp. 

In this recipe, you will learn how to change the coordinate system used in the plot and to use 
the polar coordinate system instead. 


Getting ready 


To display data in polar coordinates, we must have appropriate data values. In the polar 
coordinate system, a point is described with radius distance (usually denoted by r) and angie 
(usually theta). The angie can be in radians or degrees, but matplotiib uses degrees. 

Similarly enough to the function plot (), to draw polar plots, we will use the polar () 
function, which accepts two same-length arrays of parameters, theta and r, for the angie 
array and radius array, respectively. The function also accepts other formatting arguments, 
the same as those used by plot () one does. 

We also need to tell matplotiib that we want axes in the polar coordinate system. This is done 
by providing the polar=True argument to the add_axes or add_subplot functions. 
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Additionally, to set other properties on the figure, such as grids on radii or angies, we need 
to use matplotlib.pyplot. rgrids () to toggie radiai grid visibiiity orto set up iabeis. 
Simiiariy, we use matplotlib.pyplot. thetagrid( ) to configure angie ticks and iabeis. 


How to do it... 


Here is one recipe that demonstratas how to piot poiar bars: 

import numpy as np 

import matplotlib.cm as cm 

import matplotlib.pyplot as plt 

figsize = 7 

colormap = lambda r: cm.Set2(r / 20.) 

N = 18 # number of bars 

fig = plt.figure(figsize=(figsize,figsize)) 

ax = fig.add_axes([0.2, 0.2, 0.7, 0.7], polar=True) 

theta = np.arange(0.0, 2*np.pi, 2*np.pi/N) 
radii = 20*np.random.rand(N) 
width = np.pi/4*np.random.rand(N) 

bars = ax.bar(theta, radii, width=width, bottom=0.0) 
for r, bar in zip(radii, bars): 
bar.set_facecolor(colormap(r)) 
bar.set_alpha(0.6) 

plt.Show() 

The preceding code snippet wiii give us the foiiowing piot: 


90 ” 
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How it Works... 


First, we create a square figure and add the polar axes to it. The figure does not have to be 
square, butthen our poiar piet wiii be eiiipsoidai. 

We then generate random vaiues for a set of angies (theta) and a set of poiar distances 
(radii). Since we have drawn bars, we aiso need a set of widths for each bar, so we aiso 
generate a set of widths. matplotlib. axes . bar accepts an array of vaiues (as aimost aii 
the drawing functions in matpiotiib do), so we don't have to ioop over this generated dataset; 
we just need to caii the bar once with aii the arguments passed to it. 

In order to make every bar easiiy distinguishabie, we have to ioop over each bar added to ax 
(Axes) and customize its appearance (face-coior and transparency). 


Visualizing the filesystem tree using a 
poiar bar 


We want to show in this recipe how to soive a "reai-worid" task—how to use matpiotiib to 
visuaiize our directory occupancy. 

In this recipe, you wiii iearn how to visuaiize a fiiesystem tree with reiative sizes. 


Getting ready 


We aii have big hard drives that sometimes contain stuff that we usuaiiy forget about. It wouid 
be nice to see what is inside such a directory, and what the biggest fiie inside it is. 

Aithough there are many more sophisticated and eiaborate Software products for this job, 
we wantto demonstrate how this is achievabie using Python and matpiotiib. 


How to do it... 


Let's perform the foiiowing steps: 

1. Impiement a few heiper functions to deai with foider discovery and internai 
data structu res. 

Impiement the main function, draw (), that does the piotting. 

Impiement the main program body that verifies the user input arguments: 
import os 
import sys 


2 . 

3. 







import matplotlib.pyplot as plt 
import matplotlib.cm as cm 
import numpy as np 
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def build_folders(start_path): 
folders = [] 

for each in get_directories(start_path): 
size = get_size(each) 
if size >= 25 * 1024 * 1024: 

folders.append({'size' : size, 'path' : each}) 

for each in folders: 

print "Path: " + os.path.basename(each['path']) 
print "Size: " + str(each['size'] / 1024 / 1024) + " MB" 
return folders 

def get_size(path): 

assert path is not None 

total_size = 0 

for dirpath, dirnames, filenames in os.walk(path): 
for f in filenames: 

fp = os.path.join(dirpath, f) 
try: 

size = os.path.getsize(fp) 
total_size += size 

#print "Size of '{o}' is {1}".format(fp, size) 
except OSError as err: 
print str(err) 
pass 

return total_size 

def get_directories(path): 
dirs = set () 

for dirpath, dirnames, filenames in os.walk(path): 

dirs = set( [os.path.join(dirpath, x) for x in dirnames]) 
break # we just want the first one 
return dirs 

def draw(folders): 

IIIIII Draw folder size for given folder""" 
figsize = (8, 8) # keep the figure square 
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Ido, rup =0.1, 0.8 # leftdown and right up normalized 

fig = plt.figure(figsize=figsize) 

ax = fig.add_axes([Ido, Ido, rup, rup], polar=True) 

# transform data 

X = [os.path.basename(x['path']) for x in folders] 
y = [y['size'] / 1024 / 1024 for y in folders] 
theta = np.arange(0.0, 2 * np.pi, 2 * np.pi / len(x)) 
radii = y 

bars = ax.bar(theta, radii) 
middle = 90/len(x) 

theta_ticks = [t*(180/np.pi)+middle for t in theta] 

lines, labeis = plt.thetagrids(theta_ticks, labels=x, frac = 0.5) 

for step, each in enumerate(labeis): 

each.set_rotation(theta[step]*(180/np.pi)+ middle) 
each.set_fontsize(8) 

# configure bars 

colormap = lambda r;cm.Set2(r / len(x)) 
for r, each in zip(radii, bars): 

each.set_facecolor(colormap(r)) 
each.set_alpha(0.5) 

plt.Show() 

4. Next, we will implement the main program body where we verify the input arguments 
given by the user when the program is called from the command line: 

if _name_ == '_main_' ; 

if len(sys.argv) is not 2: 

print "ERROR: Please supply path to folder." 
sys.exit(-1) 

start_path = sys.argv[1] 

if not os.path.exists(start_path): 
print "ERROR: Path must exits." 
sys.exit(-1) 

folders = build_folders(start_path) 
if len(folders) < 1: 

print "ERROR: Path does not contain any folders." 
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sys.exit(-1) 


draw(folders) 

5. You need to run the following from the command line: 

$ pythonch04_recll_filesystem.py /usr/ 

6. It will produce a plot similar to this one: 



How it Works... 


We will start from the bottom of the code, after if _name_ == '_main _' because 

this is the place where our program starts. 

Using the module sys, we pick up the command-line arguments; they represent the path to 
the directory we want to visualize. 

The function build_folders builds the list of dictionaries, each containingthe size and 
path that it found inside the given start_path. This function calls get_directories, 
which returns a list of all the subdirectories in start_path. Later, for each directory found, 
we calculated the sizes in bytes usingthe get_size function. 


For debugging purposes, we print our dictionary so that we are able to compare the figure 
against what our data looks like. 


{ 139 ]- 














More Plots and Customizations 


After we have buiit the folders as a list of dictionaries, we pass them to a function, draw, 
that performs all the work of transforming the data to the right dimensions (here, we are 
usingthe polar coordinate system), constructingthe polarfigure, and drawingall the bars, 
ticks, and labeis. 

Strictiy speaking, we shouid divide this job Into smaller functions, especlally If this code Is to 
be further developed. 


Customizing matplotiib with style 


The default style configuratlon of matplotiib Is made to satisfy the requirements of a wlde 
audience, but this means that we always have to spend some time customizing the detalls 
that we care about. In this recipe, we want to show how to create custom and reusable styles 
for matplotiib so that we make our changes oniy once. 


Getting ready 


AII the styles that matplotiib can use are stored In a directory called stylelib, under the 
configuratlon directory of matplotiib. To check the path of this directory, we can use the 
get_conf igdir () method: 

In [1]: import matplotiib 


In [2]: matplotiib.get_configdir() 

Out[2]: u'~/.matplotiib' 

In this directory we wlll store the files that specify our custom styles. 


How to do it... 


First, we wlll create the file that contalns all the specificatlons of our style: 

axes.titlesize ; 12 
lines.linewidth : 2 
xtick.labelsize : 8 
ytick.labelsize : 8 
figure.facecolor: white 
figure.edgecolor: 555555 
xtick.color; 555555 

axes.color_cycle: E54A22, 3A89BE 

# E24A33 : red 

# 348ABD : blue 


axes.facecolor: EEEEEE 
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This style must be saved in the matplotiib conf ig directory under the stylelib directory 
with the name mystyle .mplstyle. Right after creatingthis file we can use the style: 

import matplotiib.pyplot as plt 
import matplotiib 
import numpy as np 


plt.style.use('mystyle') 

X = np.linspace(-2*np.pi, 2*np.pi, 100) 

plt.title{'sin(x)') 

plt.xlabel('X') 

plt.ylabel('y') 

plt.plot(x, np.sin(x)) 

plt.plot(x, np.cos(x)) 

plt.Show() 

The resuit is as follows: 



X 


{141]— 












More Plots and Customizations 


How it Works... 


Each line in the file mystyle .mplstyle modifies one of the elements of the matplotiib 
style. In the first line, we setthe size of the font of the titie of the figure to 12 , in the second 
one we set the width of the lines to 2 , and so on. The style is activated by passing a string 
with the name of the style to be used to the matplotiib. style .use () method, the name 
of the style is specified by the filename, and we can check all the styles available by printing 
plt.style.string.available. 
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Making 3D 
Visualizations 


You will learn the following recipes in this chapter: 

► Creating 3D bars 

► Creating 3D histograms 

► Animating in matpiotiib 

► Animating with OpenGL 


Introduction 


Visuaiization in 3D is sometimes effective and sometimes inevitabie. Here, we presentsome 
exampies that wiii satisfy mostfrequent requirements. 

The content of this chapter wiii introduce and expiain some topics on 3D visuaiizations. 


Creating 3D bars 


Aithough matpiotiib is mainiy focused on piotting and mainiy in two dimensions, there are 
different extensions that enabie us to piot over geographicai maps, to integrate more with 
Excei, and piot in 3D. These extensions are caiied tooikits in the matpiotiib worid. A tooikit is a 
coiiection of specific functions focused on one topic, such as piotting in 3D. 

Popuiar tooikits are Basemap, GTKToois, Excei Toois, Natgrid, AxesGrid, and mpiot3d. 










Making 3D Visualizations 


We will explore more about mplotSd in this recipe. Tooikit mpl_toolkits .mplotad provides 
some basic 3D piotting. Piots supported are scatter, surf, iine, and mesh piots. Aithough this 
is not the best 3D piotting iibrary, it comes with matpiotiib, and we are aiready famiiiar with 
the interface. 


Getting ready 


Basicaiiy, we stiii need to create a figure and add the desired axes to it. The difference is that 
we are now specifying a 3D projection for the figure and the axes we are adding are Axes3D. 

Now, we can use aimost the same functions for piotting. Of course, the difference is the 
argument, for we now have three axes, which we need to provide data for. 

For exampie, the mpl_toolkits.mplot3d.Axes3D.plot function specifies the xs, ys, 
zs , and zdir arguments. Aii others are transferred directiy to matpiotiib. axes .Axes . 
plot. We wiii expiain these specific arguments: 

► xs, ys: These are the coordinates for the X and Y axes. 

► zs: This is the vaiue(s) for the Z axis. There can be one vaiue for aii the points, or one 
for each point. 

► zdir: This chooses what the Z-axis dimension (usuaiiy this is zs, but can aiso be xs 
or ys) wiii be. 



There is a method rotate_axes in the mpl_toolkits .mplot3d. art3d 
module that contains 3D artist code and functions to convert 2D artists into 
3D, which can be added to Axes3D to reorder coordinates so that the axes are 
rotated with zdir along. The default vaiue is z. Prependingthe axis with a 
does the inverse transform, so zdir can be x, -x, y, -y, z, or -z. 


How to do it... 


This is the code to demonstrate the concept explained here: 

import random 

import numpy as np 
import matpiotiib as mpl 
import matpiotiib.pyplot as plt 
import matpiotiib.dates as mdates 

from mpl_toolkits.mplotSd import Axes3D 

mpl.rcParams['font.size'] = 10 


fig = plt.figureO 












Chapter 5 


ax = fig.add_subplot(111, projection='3d') 

for z in [2011, 2012, 2013, 2014] : 
xs = xrange(1,13) 
ys = 1000 * np.random.rand(12) 

color = plt.cm.Set2(random.choice(xrange(plt.cm.Set2.N))) 
ax.bar(xs, ys, zs=z, zdir='y', color=color, alpha=0.8) 

ax.xaxis.set_maj or_locator(mpl.ticker.FixedLocator(xs)) 
ax.yaxis.set_maj or_locator(mpl.ticker.FixedLocator(ys)) 


ax.set_xlabel('Month') 

ax.set_ylabel('Year') 

ax.set_zlabel('Sales Net [usd]') 

plt.Show() 

This code produces the followingfigure: 
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How it Works... 


We had to do the same prep work as in the 2D worid. The difference here is that we needed to 
specify what "kind of backend". Then we generate random data for supposediy 4 years of saie 
(2011-2014). 

We needed to specify Z vaiues to be the same for the 3D axis. 

We picked the eoior randomiy from the coior-map set, and then we associated each Z-order 
coiiection of xs, ys pairs that wouid be used to render the bar series. 


There's more... 


The other piot from 2D matpiotiib is avaiiabie here—for exampie, scatter () —which has a 
simiiar interface to plot ( ), but with increased size of the point marker. We are aiso famiiiar 
with contour, contourf , and bar. 

New types that are avaiiabie oniy in 3D are wireframe, surface, and tri-surface piots. 

For exampie, this code exampie piots a tri-surface piot of popuiar Pringie functions or, more 
mathematicaiiy, a hyperboiic paraboioid: 

from mpl_toolkits.mplotSd import Axes3D 
from matpiotiib import cm 
import matpiotiib.pyplot as plt 
import numpy as np 

n_angles = 36 
n_radii = 8 

# An array of radii 

# Does not include radius r=0, this is to eliminate duplicate points 
radii = np.linspace(0.125, 1.0, n_radii) 

# An array of angles 

angles = np.linspace(0, 2*np.pi, n_angles, endpoint=False) 

# Repeat all angles for each radius 

angles = np.repeat(angles[...,np.newaxis], n_radii, axis=l) 

# Convert polar (radii, angles) coords to cartesian (x, y) coords 

# (0, 0) is added here. There are no duplicate points in the (x, y) 
plane 

X = np.append(0, (radii*np.cos(angles)).flatten()) 
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y = np.append(0, (radii*np.sin(angles)).flatten()) 

# Pringle surface 
z = np.sin(-x*y) 

fig = plt.figureO 

ax = fig.gea(projection='3d') 

ax.plot_trisurf(x, y, z, cmap=cm.jet, linewidth=0.2) 
plt.Show() 

The code will give the following output: 



Creating 3D histograms 


Similarly to 3D bars, we might want to create 3D histograms. These are usefui for easily 
spotting correlation between three independent variables. They can be used to extract 
information from images in which the third dimension couid be the intensity of a channei in 
the X, y space of the image under anaiysis. 

In this recipe, you wiii iearn howto create 3D histograms. 
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Getting ready 


To recall, a histogram represents the number of occurrences of some value in a particular 
column—usually called bin. A 3D histogram, then, represents the number of occurrences in a 
grid. This grid is rectanguiar, over two variabies represented by the data in the two coiumns. 


How to do it... 


For this computation we wiii: 

1. Use NumPy's heip, as it has a function for computingthe histogram of two variabies. 

2. Generate x and y from normai distributione, but with different parameters, to be abie 
to distinguish the correiation in the resuiting histogram. 

3. Piot the scatter piot of the same dataset, to demonstrate how different the dispiay of 
the scatter piot is to the 3D histogram. 

Here is the code sampie to impiement the described steps: 

import numpy as np 

import matplotlib.pyplot as plt 

import matplotlib as mpl 

from mpl_toolkits.mplotSd import Axes3D 

mpl.rcParams['font.size'] = 10 

samples = 25 

X = np.random.normal(5, 1, samples) 
y = np.random.normal(3, .5, samples) 

fig = plt.figureO 

axi = fig.add_subplot(211, projection='3d') 

# compute two-dimensional histogram 

hist, xedges, yedges = np.histogram2d(x, y, bins=10) 

# compute location of the x,y bar positions 
elements = (len(xedges) - 1) * (len(yedges) - 1) 

xpos, ypos = np.meshgrid(xedges[:-1]+.25, yedges[:-1]+.25) 

xpos = xpos.flatten() 
ypos = ypos.flatten() 
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zpos = np.zeros(elements) 

# make every bar the same width in base 
dx = .1 * np.ones_like(zpos) 

dy = dx.copy() 

# this defines the height of the bar 
dz = hist.flatten() 

axi.barSd(xpos, ypos, zpos, dx, dy, dz, color='b', alpha=0.4) 
axi.set_xlabel('X Axis') 
axi.set_ylabel('Y Axis') 
axi.set_zlabel('Z Axis') 

# plot the same x,y correlation in scatter plot 

# for comparison 

ax2 = fig.add_subplot(212) 
ax2.scatter(x, y) 
ax2.set_xlabel('X Axis') 
ax2.set_ylabel('Y Axis') 

plt.Show() 

This code will give the following output: 
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How it Works... 


We prepare a computer histogram using np.histogram2d, which returns our histogram 
(hist) and x and y bin edges. 

Because for the bardsd function we need coordinates in x, y space, we need to compute 
the common matrix coordinates, and for that we use np. meshgrid that combines x and y 
positionai vectors into a 2D space grid (matrix). This we can use to piot bars in the XY 
piane iocations. 

Variabies dx and dy represent the width of the base of each bar, and as we want to make this 
constant, we give it a 0.1 point vaiue for every position in the xy piane. 

The vaiue in the Z-axis (dz) is actuaiiy our computer histogram (in the variabie hist) that 
represents the count of common x and y sampies at a particuiar bin. 

The scatter piot dispiays the 2D axes that aiso visuaiize the correiation between two simiiar 
distributione, but with a different set of starting parameters. 

Sometimes, 3D gives us more information and better expiains whatthe data is showing. More 
often, however, 3D visuaiizations are more confusingthan 2D, and it is advisabie to think 
twice before choosingthem over 2D. 


Animating in matplotiib 


In this recipe, we wiii expiore how to animate our figures. Sometimes it is more descriptive 
to have pictures moving in animatione to expiain what happens if we change the vaiues of 
variabies. Our main iibrary has iimited but usuaiiy sufficient animation capabiiities and we 
wiii expiain how to use them. 


Getting ready 


The framework for animation is added to Standard matpiotiib from version 1.1 and its 
main ciass is matplotiib. animation. Animation. This ciass is the base ciass, which 
is to be subciassed for specific behavior, as is the case with the ciasses aiready provided: 
TimedAnimation, ArtistAnimation, and FuncAnimation. 


Ciass name (parent ciass) 

Description 

Animation (object) 

This ciass wraps the creation of an animation 
using matplotiib. It is oniy a base ciass which 
shouid be subciassed to provide the required 
behavior. 
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Class name (parent class) 

Description 

TimedAnimation(Animation) 

The Animation subclass supports time- 
based animation, drawing a new frame every 
*interval* milliseconds. 

ArtistAnimation(TimedAnimation) 

Before calling this function, ali plotting shouid 
have taken place and the relevant artists saved. 

FuncAnimation(TimedAnimation) 

Makes an animation by repeatediy calling a 
function, passing in (optional) arguments. 


In order to be able to save animations in a video fiie, we must have the f fmpeg or mencoder 
instaiier. Instaiiation of these packages varies depending on the OS used, and changes by 
different reieases, so we must ieave it to the dear reader to Googie the vaiid information. 


How to do it... 


Here is the code iisting to demonstrate some matpiotiib animations: 

import numpy as np 

from matpiotiib import pyplot as plt 
from matpiotiib import animation 

fig = plt.figureO 

ax = plt.axes(xlim=(0, 2), ylim=(-2, 2)) 
line, = ax.plot([], [], lw=2) 

def init () : 

"""Clears current frame.""" 
line . set_data ( [] , [] ) 

return line, 

def animate(i): 

"""Draw figure. 

(Sparam i: Frame counter 
Otype i; int 

fl It tl 

X = np.linspace(0, 2, 1000) 

y = np.sin(2 * np.pi * (x - 0.01 * i)) * np.cos(22 * np.pi * (x - 
0.01 * i)) 

line.set_data(x, y) 
return line. 


# This call puts the work in motion 
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# connecting init and animate functions and figure we want to draw 
animator = animation.FuncAnimation(fig, animate, init_func=init, 

frames=200, interval=20, blit=True)# 
set blit to False if you're under OS X! 

# This call creates the video file. 

# Temporary, every frame is saved as PNG file 

# and later processed by ffmpeg encoder into MPEG4 file 

# we can pass various arguments to ffmpeg via extra_args 
animator.save('basic_animation.mp4', fps = 3 0, 

extra_args=['-vcodec', 'libx264'], 

writer='ffmpeg_file') 

plt.Show() 

This will create the basic_animation. mp4 file in the folder you started this file from, and 
also displays a figure window with the running animation. The video file can be opened with 
most modern video players that support the MPEG-4 format. The figure (frame) shouid look 
like this: 
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How it Works... 


Most important are the init (), animate (), and save () functions. We first construet 
FuncAnimate by passing two callback functions to it, init and animate. Then, we caii the 
save method to save our video fiie. More detaiis on each function are in the foiiowing tabie: 


Function name 

Usage 

init 

Passed to matplotlib.animation.FuncAnimation 
constructor via parameter init fune to ciear the frame 
before the next frame is drawn. 

animate 

Passed to matplotlib.animation.FuncAnimation 
constructor via fune parameter. 

The figure we want to animate is passed via f ig argument, 
which is passed under the hood to the matplotlib. 
animation. Animation constructor to connect 
animation events with the figure we want to draw. This 
function gets (optional) parameters from frames— usually 
iterable, representingthe number of frames. 

matplotlib.animation. 
Animation.save 

Saves a movie file by drawing every frame. It creates 
temporary image files before Processing them through the 
encoder (f fmpeg or mencoder) to create a video file. This 
function also accepts various parameters that configure 
video output, including metadata (author...), codec to use, 
and resolution/size. One of the parameters is - which 
defines what video encoder to use. Currently supported are 
f fmpeg, f fmpeg file, and mencoder. 


There's more... 


The usage of matplotlib. animat ion .ArtistAnimat ion differs from FuncAnimation 
in that we must draw each artist beforehand and then instantiate the ArtistAnimation 
ciass with aii the different frames of the artist. ArtistAnimation is a kind of a wrapper 
of the matplotlib. animation. TimedAnimation ciass that draws frames every N 
miiiiseconds, thus supportingtime-based animation. 



For Mac OS X users, animation framework can unfortunately be troublesome 
on this platform, and sometimes simply does not work. This will improve with 
future releases of matplotlib. 


] 
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Animating with OpenGL 


The motivation to use OpenGL stems from limitations of CPU Processing power when we are 
faced with the task of visuaiizing miiiions of data points and doing it fast (sometimes even in 
reai time). 

Modern computers have powerfui GPUs that are made for fast visuaiization-reiated 
computations (such as games), and there is no reason why they can't be used for science- 
reiated visuaiizations. 

Actuaiiy, there is at ieast one drawback of writing hardware-acceierated Software that is 
hardware dependent. Modern graphics cards require proprietary drivers which are sometimes 
not avaiiabie on the target piatform/machine (the user's iaptop, for exampie). Even when 
avaiiabie, sometimes instaiiing the required dependencies on-site is not what you want to 
spend your time on, whiie aii you want is to present your findings and demonstrate your 
research resuits. This is not a showstopper but you shouid bear this in mind, and measure the 
benefits and costs of introducingthis compiexity in your project. 

With the caveats expiained, we can say yes to hardware-acceierated visuaiizations and to 
OpenGL, which is the industry Standard for acceierated graphics. 

We wiii be using OpenGL as it is cross-piatform, so the exampies shouid work as presented on 
Linux, Mac, or Windows, provided you have the required hardware and OS-ievei drivers instaiied. 


Getting ready 


If you have never used OpenGL, we wiii nowtry to give a quick introduction, aithough to reaiiy 
understand OpenGL, at ieast one compiete book needs to be read. OpenGL is a specification, 
not an impiementation, so OpenGL itseif doesn't have any code, whiie the impiementations 
are iibraries deveioped according to this specification. These are shipped with your operating 
System or by vendors of graphics cards such as NVidia or AMD/ATI. 

Moreover, OpenGL is concerned oniy with graphics rendering and not animation, timing, or 
other "compiex" things that are ieft for additionai iibraries to pick up. 



Basies of animating with OpenGL 

Since OpenGL is a rendering library, it does not know what objects we draw 
on a screen. It doesnt care if we draw a cat, a bali, a line, or ali of these 
objects. So, to move a rendered object, we need to ciear and draw the 
whole image again. To animate something, we need a loop that draws and 
redraws everything very quickly and displays it to a user, so that the user 
thinks he/she sees an animation. 
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Installing OpenGL on a machine is a platform-dependent process. On the Mac OS X, OpenGL 
implementation is part of the OS upgrade, but deveiopment iibraries (so caiied headers) are 
part of the Xcode deveiopment package. 

On Windows, the best way wouid be to instaii the vendods iatest graphics drivers for your 
graphic card. OpenGL may work without them, but you wiii probabiy be ieft without the iatest 
features with stock drivers. 

On Linux, if you are not against instaiiing ciosed source Software, there are vendor-specific 
drivers downioadabie either from the distro's own Software manager or from the vendor site 
as an instaiiabie binary. Standard impiementations are aimost aiways MesaSD—the best 
known OpenGL impiementation, which uses Xorg to provide support for OpenGL for Linux, 
FreeBSD, and simiiar operating systems. 

On Debian/Ubuntu, you shouid instaii the foiiowing packages and their dependencies: 

$ sudo apt-get instaii libgll-mesa-dev libgl-mesa-dri 

After this, you shouid be ready to use some deveiopment iibraries and/or frameworks to 
actuaiiy write OpenGL-backed appiications. 

We are focused here on Python, so we wiii overview some of the Python's most used iibraries 
and frameworks that are buiit on top of OpenGL. We wiii mention matpiotiib and its current 
and future support for OpenGL: 

► Mayavi: This is a iibrary speciaiized for 3D 

► Pyglet: This is a pure Python iibrary for graphics 

► Glumpy: This is a fast rendering iibrary buiit on top of NumPy 


How to do it... 


Speciaiized project Mayavi is a fuii-featured 3D graphics iibrary, which is mainiy used for 
advanced 3D rendering. It comes with aiready mentioned Python packages iike EPD (though 
not with a free iicense), which is a recommended way of instaiiing it on Windows and Mac OS 
X. On Linux, it can aiso be easiiy instaiied using pip: 

$ pip instaii mayavi 

Mayavi can be used as a deveiopment iibrary/framework or as an appiication. The Mayavi 
appiication comprises a visuai editor for easy data expioration and somewhat interactive 
visuaiization. 
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As a library, it can be used in a simiiar way to matplotlib—either from the script interface 
or as a fuii object-oriented iibrary. Most of that interface is inside the mlab moduie, to be abie 
to use that interface. For exampie, simpie animation with Mayavi can be done as foiiows: 

import numpy 

from mayavi.mlab import * 

# Produce some nice data. 
n_mer, n_long =6, 11 

pi = numpy.pi 
dphi = pi/1000.0 

phi = numpy.arange(0.0, 2*pi + 0.5*dphi, dphi, 'd') 
mu = phi*n_mer 

X = numpy.cos(mu)*(1+numpy.cos(n_long*mu/n_mer)* 0.5) 
y = numpy.sin(mu)*(1+numpy.cos(n_long*mu/n_mer)*0.5) 
z = numpy.sin(n_long*mu/n_mer)* 0.5 

# View it. 

1 = plot3d(x, y, z, numpy.sin(mu), tube_radius=0.025, 
colormap='Spectral') 

# Now animate the data, 
ms = 1.mlab_source 

for i in range(lOO); 

X = numpy.cos(mu)*(1+numpy.cos(n_long*mu/n_mer + 

numpy.pi*(i+1)/5.)*0.5) 
scalars = numpy.sin(mu + numpy.pi*(i+1)/5) 
ms.set(x=x, scalars=scalars) 

This code wiii produce the foiiowing window with rotating figure: 
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How it Works... 


We generate a dataset and create set of functions for x, y, and z to be used in the plotsd 
function for start position of the figure. 

We then import the mlab_source object that enables us to manipulate our plot on the level 
of points and scalars. We then use this feature to set particular points and scalars in for ioop 
to create a rotation animation with 100 frames. 


There's more... 


If you want to experiment more, the easiest way to do so is to ioad IPython, import mayavi. 
mlab, and run some test_* functions. 

To see what is going on, you can use IPython's abiiity to inspect and expiore Python source, 
as foiiows: 

In [1]: import mayavi.mlab 

In [2]: mayavi.mlab.test_simple_surf?? 

Type: function 
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String Form;<funetion test_simple_surf at 0x641b410> 

File: /usr/lib/python2.7/dist-packages/mayavi/tools/helper_ 

functions.py 

Definition: mayavi.mlab.test_simple_surf() 

Source: 

def test_simple_surf(): 

"""Test Surf with a simple collection of points.""" 

X, y = numpy.mgrid[0:3:1,0:3:1] 

return surf(x, y, numpy.asarray(x, 'd')) 

We can see here how, by adding two question marks after the function name ("??"), IPython 
found the source of the function and showed it to us. This is true expioratory computing, and 
is often used within the visuaiization community, because it is a fast way to get to know your 
data and code. 




6 

Plotting Charts with 
Images and Maps 


This chapter contains recipes that will cover: 

► Processing images with PIL 

► Piotting with images 

► Dispiaying images with other piots in the figure 

► Piotting data on a map using Basemap 

► Piotting data on a map using the Googie Map API 

► Generating CAPTCHA images 


Introduction 


This chapter expiores how to work with images and maps. Python has some weii-known image 
iibraries that aiiow us to process images in both aesthetic and scientific ways. 

First, we wiii introduce the capabiiities of PIL (and its friendiy fork Pillow), by demonstrating 
howto process images by appiyingfiiters and resizingthem. 

Furthermore, we wiii show you how to use image fiies as annotation for our matpiotiib charts. 

To deai with data visuaiization of geospatiai datasets, we wiii cover the functionaiity 
of Python's avaiiabie iibraries and pubiic APIs that we can use with map-based visuai 
representations. 

The finai recipe shows how Python can create CAPTCFiAtest images. 
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Processing images with PIL 


Why use Python for image Processing if we can use WIMP (http: //en. wikipedia. org/ 
wiki/WIMP_ (computing) ) or WYSIWYG (http: //en. wikipedia.org/wiki/WYSIWYG) 
to achieve the same goai? Python is used because we want to create an automated system to 
process images in reai time without human support, thus optimizingthe image pipeiine. 


Getting ready 


Note that the PIL coordinate system assumes that the (0,0) coordinate is in the upper-ieft comer. 

The Image moduie has a usefui ciass and instance method for performing basic operations 
over a ioaded image object (im): 

► im = Image. open (filename) : This opens a fiie and ioads the image into the 
im object. 

► im. crop (box) : This crops the image inside the coordinates defined by box. box 
defines the ieft, upper, right, and iower pixei coordinates (for exampie, box = ( o , 
100 , 100 , 100 )). 

► im. f ilter (f ilter) : This appiies a fiiter on the image and returns a fiitered image. 

► im.histogramO : This returns a histogram iist for this image, where each item 
represents the number of pixeis. The number of items in the iist is 256 for singie 
channei images, but if the image is not a singie channei image, there can be more 
items in the iist. For an RGB image, the iist contains 768 items (one set of 256 
vaiues for each channei). 

► im. resize (size, fiiter) : This resizes the image and uses a fiiter for 
resampiing. The possibie fiiters are nearest, bilinear, bicubic, and 
ANTiALiAS. The defauit is nearest. 

► im. rotate (angle, fiiter) : This rotates an image in the counter ciockwise 
direction. 

► im. split 0 : This spiits the bands of an image and returns a tupie of individuai 
bands. Usefui for spiitting an RGB image into three singie band images. 

► im. transf orm (size, method, data, fiiter) : This appiies transformation 
on a given image using data and a fiiter. Transformation can be affine, 

EXTENT, QUAD, and MESH. You can read more about transformation in the officiai 
documentation. Data defines the box in the originai image where the transformation 
wiii be appiied. 

The imageDraw moduie aiiows us to draw over the image, where we can use functions such 
as arc, ellipse, line, pieslice, point, and polygonto modify the pixeis of the ioaded 
image. 
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The imageChops module contains a number of image channei operations (hence the 
name chops) that can be used for image compositiori, painting, speciai effects, and other 
Processing operations. Channei operations are allowed oniy for 8-bit images. Here are some 
interesting channei operations: 

► ImageChops. duplicate (image) : This copies the current image into a new image 
object 

► ImageChops . invert (image) : This inverts an image and returns a copy 

► ImageChops . dif f erence (imagel, image2 ): This is usefui for verification that 
images are the same without visual inspection 

The imageFilter module contains the implementation ofthe kernei class that allows the 
creation of custom convolution kernels. This module also contains a set of healthy common 
filters that allows the application of well-known filters (blur and MedianFilter) to our image. 

There are two types of filters provided by the imageFilter module: fixed image 
enhancement filters and image filters that require certain arguments to be defined, for 
example, the size of kernei to be used. 


We can easily get the list of ali fixed filter names in IPython: 



In [1]: import ImageFilter 

In [2] : [ f for f in dir (ImageFilter) if f.isupperO] 

Out[2]: 

['BLUR', 

'CONTOUR', 

'DETAIL', 

'EDGEENHANCE', 

'EDGEENHANCEJMORE', 

'EMBOSS', 

'FINDEDGES', 

'SHARPEN', 

'SMOOTH■, 

'SMOOTH MORE'] 


The next example shows how we can apply ali currently supported fixed filters on any 
supported image: 

import os 
import sys 

from PIL import Image, ImageChops, ImageFilter 


class DemoPIL(object): 

def _init_(self, image_file=None): 
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self.fixed_filters = [ff for ff in dir(ImageFilter) if 
ff.isupper()] 

assert image_file is not None 

assert os.path.isfile(image_file) is True 

self.image_file = image_file 

self.image = Image.open(self.image_file) 

def _make_temp_dir(self): 

from tempfile import mkdtemp 

self.ff_tempdir = mkdtemp(prefix="ff_demo") 

def _get_temp_name(self, filter_name): 

name, ext = os.path.splitext(os.path.basename(self.image_ 

file)) 

newimage_file = name + + filter_name + ext 

path = os.path.join(self.ff_tempdir, newimage_file) 
return path 

def _get_filter(self, filter_name): 

# note the use Python's eval() builtin here to return function 

obj ect 

real_filter = eval("ImageFilter." + filter_name) 
return real_filter 

def apply_filter(self, filter_name): 

print "Applying filter: " + filter_name 
filter_callable = self._get_filter(filter_name) 

# prevent calling non-fixed filters for now 
if filter_name in self.fixed_filters: 

temp_img = self.image.filter(filter_callable) 
else : 

print "Can't apply non-fixed filter now." 
return temp_img 

def run_fixed_filters_demo(self): 
self._make_temp_dir() 
for ffilter in self.fixed_filters: 

temp_img = self.apply_filter(ffilter) 
temp_img.save(self._get_temp_name(ffilter)) 
print "Images are in: {0}".format((self.ff_tempdir),) 

if _name_ == "_main_" : 

assert len(sys.argv) == 


2 
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demo_image = sys.argv[l] 
demo = DemoPIL(demo_image) 

# will create set of images in temporary folder 
demo.run_fixed_filters_demo() 

We can run this easily from the command prompt: 

$ pythonch06_rec01_01_pil_demo.py image.jpeg 

We packed our littie demo in the DemoPiL class, so we can extend it easiiy whiie sharing 
the common code around the run_f ixed_f ilters_demo demo function. Common code 
here inciudes openingthe image fiie, testing if the fiie is reaiiy a fiie, creating a temporary 
directoryto hoid ourfiitered images, buiidingthe fiitered image fiiename, and printing usefui 
information to the user. This way the code is organized in a better manner, and we can easiiy 
focus on our demo function, without touching other parts of the code. 

This demo wiii open our image fiie and appiy every fixed fiiter avaiiabie in imageFilter to it 
and save that new fiitered image in a unique temporary directory. At the end of the process, 
the script prints the path of the temporary directory used so that we can check the output of 
the fiiters. 

As an optionai exercise, try extendingthis demo ciass to perform other fiiters avaiiabie in 
ImageFilter on the given image. 


How to do it... 


The exampie in this section shows how we can process aii the images in a certain foider. 
We specify a target path, and the program that reads aii the image fiies in thattarget path 
(images foider) resizes them to a specified ratio (o. i in this exampie), and saves each one 
in a target foider caiied thumbnail_folder: 


import os 
import sys 

from PIL import Image 


class Thumbnailer(object): 

def _init_(self, src_folder=None): 

self.src_folder = src_folder 
self.ratio = .3 

self.thumbnail_folder = "thumbnails" 

def _create_thumbnails_folder(self): 

thumb_path = os.path.join(self.src_folder, self.thumbnail 

foider) 

if not os.path.isdir(thumb_path): 
os.makedirs(thumb_path) 
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def _build_thumb_path(self, image_path): 
root = os.path.dirname(image_path) 

name, ext = os.path.splitext(os.path.basename(image_path)) 
suffix = ".thumbnail" 

return os.path.join(root, self.thumbnail_folder, name + suffix 

+ ext) 

def _load_files(self) : 
files = set() 

for each in os.listdir(self.src_folder): 

each = os.path.abspath(self.src_folder + '/' + each) 
if os.path.isfile(each): 
files.add(each) 

return files 

def _thumb_size(self, size); 

return (int(size[0] * self.ratio), int(size[l] * self.ratio)) 

def create_thumbnails(self) ; 

self._create_thumbnails_folder() 
files = self._load_files() 

for each in files; 

print "Processing; " + each 
try; 

img = Image.open(each) 

thumb_size = self._thumb_size(img.size) 
resized = img.resize(thumb_size, Image.ANTIALIAS) 
savepath = self._build_thumb_path(each) 
resized.save(savepath) 
except lOError as ex; 

print "Error; " + str(ex) 

if _name_ == "_main_" ; 

# Usage; 

# ch06 recOl 02 pii thumbnails.py my_images 
assert len(sys.argv) == 2 

src_folder = sys.argv[1] 

if not os.path.isdir(src_folder); 

print "Error; Path '{o}' does not exits.".format((src_folder)) 
sys.exit(-1) 

thumbs = Thumbnailer(src_folder) 
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# optionally set the name of each thumbnail folder relative to 
src_folder*. 

thumbs.thumbnail folder = "THUMBS" 


# define ratio to resize image to 

# 0.1 means the original image will be resized to 10% of its size 
thumbs.ratio = 0.1 


# will create set of images in temporary folder 
thumbs.create thumbnailsO 


How it Works... 


For the given folder src_f older, we load all the files in this folder and try to load each file 
using Image . open () ; this is the logic of the create_thumbnails () function. If the file 
we try to load is not an image, lOError will be thrown, and it will print this error and skip to the 
next file in the sequence. 

If we want to have more controi over which files we load, we shouid change the 
_load_f iles () function to oniy include files with a certain extension (file type): 

for each in os.listdir(self.src_folder): 

if os.path.isfile(each) and os.path.splitext(each) is in 
('.jpg','.png'); 

self._files.add(each) 

This is not foolproof as the file extension does not define the file type, it just heips the 
operating system to attach a default program to the file. But it works in the majority of cases 
and is simpler than reading a file header to determine the file content (which stili does not 
guarantee that the file really is the first couple of bytes it says is). 


There's more... 


With PIL, although not used very often, we can easily convert images from one format to 
another. This is achievable with two simple operations: first, open an image in a source format 
using open (), and then save that image in another format using save (). The format is 
defined either implicitiy via the filename extension ( .png or . jpeg) or explicitiy via the format 
of the argument passed to the save () function. 
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Plotting with images 


Images can be used to highlight the strengths of your visualization in addition to pure data 
values. Many examples have proven that by using symbolic images, we map deeper into the 
viewer's mentai modei, thereby heiping the viewer to remember the visuaiizations better and 
for a ionger time. One way to do this is to piace images where your data is, to map the vaiues 
to what they represent. The matplotlib iibrary is capabie of deiivering this functionaiity, and 
here we demonstrate how to do it. 


Getting ready 


We wiii use the fictionai exampie from the story The GospeI ofthe Flying Spaghetti Monster, 
by Bobby Henderson, where the author correiates the number of pirates with the sea-surface 
temperature. To highiight this correiation, we wiii dispiay the size of the pirate ship proportionai 
to the vaiue representingthe number of pirates in the year the sea-surface temperature 
is measured. 

We wiii use Python matpiotiib iibrary's abiiity to annotate using images and text with advanced 
iocation settings, as weii as arrow capabiiities. 

Aii the fiies required in the foiiowing recipe are avaiiabie in the source code repository in the 
Chapteroe foider. 


How to do it... 


The foiiowing exampie shows how to add an annotation to a chart using images and text: 

import matplotlib.pyplot as plt 
from matplotlib._png import read_png 

from matplotlib.offsetbox import TextArea, Offsetimage, \ 

AnnotationBbox 


def load_data(); 
import csv 

with open('pirates_temperature.csv', 'r') as f: 

reader = csv.reader(f) 
header = reader.next() 
datarows = [] 
for row in reader; 

datarows.append(row) 
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return header, datarows 


def format_data(datarows); 

years, temps, pirates = [] , [] , [] 

for each in datarows: 

years.append(each[0]) 
temps.append(each[1]) 
pirates.append(each[2]) 
return years, temps, pirates 

After we have defined helper functions, we can approach the construction of the figure object 
and add subpiots. We will annotate these for every year in the collection of years using the 
image of the ship, scalingthe image to the appropriate size: 

if _name_ == "_main_" ; 

fig = plt.figure(figsize=(16,8)) 
ax = plt.subplot(111) # add sub-plot 

header, datarows = load_data() 

xlabel, ylabel = header[0], header[1] 

years, temperature, pirates = format_data(datarows) 

title = "Global Average Temperature vs. Number of Pirates" 

plt.plot(years, temperature, lw=2) 
plt.xlabel(xlabel) 
plt.ylabel(ylabel) 

# for every data point annotate with image and number 
for X in xrange(len(years)); 

# current data coordinate 

xy = years[x], temperature[x] 

# add image 

ax.plot(xy[0], xy[l], "ok") 

# load pirate image 

pirate = read_png('tali-ship.png') 

# zoom coefficient (move image with size) 
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zoomc = int(pirates[x]) * (1 / 90000.) 


# create Offsetimage 

imagebox = Offsetimage(pirate, zoom=zoomc) 


# create anotation bbox with image and setup properties 
ab = AnnotationBbox(imagebox, xy, 

xybox=(-200.*zoomc, 200.*zoomc), 
xycoords='data' , 
boxcoords="offset points". 


pad=0.1, 

arrowprops=dict(arrowstyle="->", 

connectionstyle="angle,angleA=0,angleB=- 

3 0,rad=3") 


ax.add_artist(ab) 


# add text 

no_pirates = TextArea(pirates[x], minimumdescent=False) 
ab = AnnotationBbox(no_pirates, xy, 
xybox=(50., -25.), 

xycoords='data', 
boxcoords="offset points". 


pad=0.3, 

arrowprops=dict(arrowstyle="->", 

connectionstyle="angle, angleA=0, angleB=- 

30,rad=3") 


ax.add_artist(ab) 

plt.grid (1) 

plt.xlim(1800, 2020) 

plt.ylim(14, 16) 

plt.title(title) 


plt.Show() 
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The preceding code shouid give the following plot: 



How it Works... 


We start by creating a figure of a decent size, that is, 16 x 8. We need this size to fit the 
images we want to dispiay. Now, we ioad our data from the fiie, using the csv moduie. 
Instantiating the csv reader object, we can iterate over the data from the fiie row by row. 

Note that the first row is speciai, it is the header describing our coiumns. As we have 
piotted years on the x axis and temperature on the y axis, we read that: 

xlabel, ylabel, _ = header 

And use the foiiowing iines: 

plt.xlabel(xlabel) 
plt.ylabel(ylabel) 

We used a neat Python convention here to unpack the header into three 
variables, where using _ for variable name, we indicate that we are not 
interested in the value of that variable. 

We return the header and datarows lists from the load_data function to the main caller. 

Using the f ormat_data () function, we read every item in the list and add each separate 
entity (year, temperature, and number of pirates) into the relevant ID list for that entity. 
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Year is displayed along the x axis, whiie temperature is on the y axis. The number of pirates is 
dispiayed as an image of a pirate ship, and aiso to add precision the vaiue is dispiayed. 

We piot year/temperature vaiues using the Standard plot () function, not adding anything 
more, apart from making the iine a bit wider (2 pt). 

We proceed then to add one image for every measurement and to iiiustrate the 
number of pirates for a given year. For this, we ioop over the range of vaiues of iength 
(range (len(years) ) ) , piottingone biack point on each year/temperature coordinate: 

ax.plot(xy[0], xy[l], "ok") 

The image of the ship is ioaded from the fiie into a suitabie array format using the read_png 
heiper function: 

pirate = read_png('tali-ship.png') 

We then compute the zoom coefficient (zoomc) to enabie us to scaie the size of the image in 
proportion to the number of pirates for the current (pirates [x] ) measurement. We aiso use 
the same coefficient to position the image aiongthe piot. 

The actuai image is then instantiated inside of f setimage— the image Container with reiative 
position to its parent (AnnotationBbox). 

AnnotationBbox is an annotation-iike ciass, but instead of dispiaying just text as with the 
Axes. annotate function, it can dispiay other of f setBox instances. This aiiows us to ioad 
an image or text object in an annotation and iocate it at a particuiar distance from the data 
point, as weii as aiiowing us to use the arrowing capabiiities (arrowprops) to preciseiy point 
to an annotated data point. 

We suppiy the AnnotateBbox constructor with certain arguments: 

► imagebox: This must be an instance of Off setBox (for exampie, Off setimage); it 
is the content of the annotation box 

► xy: This is the data point coordinate that the annotation reiates to 

► xybox: This defines the iocation of the annotation box 

► xycoords: This defines what coordinating system is used by xy (for exampie, data 
coordinates) 

► boxcoords: This defines what coordinating system is used by xybox (for exampie, 
offset from the xy iocation) 

► pad: This specifies the amount of padding 

► arrowprops: This is the dictionary of properties for drawing an arrow connection 
from an annotation-bounding box to a data point 
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We add text annotation to this plot, using the same data items from the pirates list with a 
slightiy different relative position. Most of the arguments of the second AnnotationBbox are 
the same—we adjust xybox and pad to locate the text to the opposite side of the line. The text 
is inside the TextArea class instance. This is similar to what we do with the image, but with 
text time .TextArea and Of f setimage inherit from the same Of f setBox parent class. 

We set the text in this TextArea instance to no_pirates and put it in our 
AnnotationBbox. 


Displaying images with other plots in the 
figure 


This recipe will show how we can make simple yet effective usage of Python matplotiib library 
to process image channeis and display the per-channei histogram of an external image. 


Getting ready 


We have provided some sample images, but the code is ready to load any image file, provided 
it is supported by matplotlib's imreadfunction. 

In this recipe, you will learn how to combine different matplotiib plots to achieve functionality 
of a simple image viewer that displays an image histogram for red, green, and blue channeis. 


How to do it... 


To Show how to build an image histogram viewer, we are going to implement a simple class 
named imageviewer, and that class will contain helper methods to: 

1. Load image. 

2. Separate RGB channeis from image matrix. 

3. Configure figure and axes (subpiots). 

4. Plot channei histograms. 

5. Plot the image. 

The following code shows how to build an image histogram viewer: 

import matplotiib.pyplot as plt 
import matplotiib.image as mplimage 
import matplotiib as mpl 
import os 


class Imageviewer(object) : 
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def _init_(self, imfile): 

self._load_image(imfile) 
self._configure() 

self.figure = plt.gcfO 

t = "Image: {0}".format(os.path.basename(imfile)) 
self.figure.suptitle(t, fontsize = 2 0) 

self.shape = (3, 2) 

def _configure(self): 

mpl.rcParams['font.size'] = 10 
mpl.rcParams['figure.autolayout'] = False 
mpl.rcParams['figure.figsize'] = (9, 6) 
mpl.rcParams['figure.subplot.top'] = .9 

def _load_image(self, imfile): 

self.im = mplimage.imread(imfile) 

(Sstaticmethod 
def _get_chno(ch): 

chmap = {'R': 0, 'G': 1, 'B': 2} 

return chmap.get(ch, -1) 
def show_channel(self, ch): 
bins = 256 
ec = 'none' 

chno = self._get_chno(ch) 
loc = (chno, 1) 

ax = plt.subplot2grid(self.shape, loc) 

ax.hist(self.im[:, :, chno].flatten(), bins, color=ch, ec=ec,\ 

label=ch, alpha=.7) 
ax.set_xlim(0, 255) 

plt.setp(ax.get_xticklabels(), visible=True) 
plt.setp(ax.get_yticklabels(), visible=False) 
plt.setp(ax.get_xticklines(), visible=True) 
plt.setp(ax.get_yticklines(), visible=False) 
plt.legend() 

plt.grid(True, axis='y') 
return ax 

def Show(self): 
loc = (0, 0) 

axim = plt.subplot2grid(self.shape, loc, rowspan=3) 
axim.imshow(self.im) 
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plt.setp(axim.get_xticklabels(), visible=False) 
plt.setp(axim.get_yticklabels(), visible=False) 
plt.setp(axim.get_xticklines(), visible=False) 
plt.setp(axim.get_yticklines(), visible=False) 
axr = self.show_channel('R') 
axg = self.show_channel('G') 
axb = self.show_channel('B') 
plt.Show() 

if _name_ == '_main_' ; 

im = 'images/yellow_flowers.jpg' 
try; 

iv = ImageViewer(im) 
iv.Show() 

except Exception as ex: 
print ex 


How it Works... 


Readingfrom the end of the code, we see hard-coded filenames. These can be swapped by 
loading the argument from the command line and parsing the given argument into the im 
variable using the sys. argv sequence. 

We instantiate the imageViewer class with the provided path to an image file. During object 
instantiation, we try to load an image file into an array, configure the figure via the rcParams 
dictionary, set the figure size and titie, and define the object fields (seif.shape) to be used 
inside the objecfs methods. 

The main method here is show (), which creates a layout for the figure and loads the image 
arrays into the main (left column) subpiot. We hide any ticks and tick labeis as this is the 
actual image, where we don't have to use the ticks. 

Wethen call the private show_channel () method for each of the red, green, and blue 
channels. This method also creates new subpiot axes, this time in the right-hand side column, 
with each one in a separate row. We plotthe histogram for each channei in a separate subpiot. 

We also set up a littie plot to remove unnecessary x ticks, and add a legend in case we 
want to print this figure in a non-color environment, in which case we can discern channei 
representation even in those environments. 
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After we run this code, we will get the following screenshot: 



There's more... 


The use of the histogram plet type is just a choice for this image viewer exampie. We couid 
have used any of the matpiotiib supported piottypes. Another reai-worid exampie wouid be to 
piot an EEG or simiiar medicai records where we wouid want to dispiay siice as an image, the 
time series of the EEG recorded as a iine piot, and aiso additionai meta information aboutthe 
data shown, that wouid probabiy go into matpiotiib. text. Text artists. 

Having the abiiity to interact with the user GUI event, matpiotiib's figure aiiows us aiso to 
impiement interaction where we wouid want to zoom into aii piots if we manuaiiy zoom on 
one piot oniy. That wouid be another usage where we want to dispiay an image and zoom 
into it whiie aiso zooming into other dispiayed piots in the currentiy active figure. An idea 
wouid be to use motion_notify_event to caii a function that wiii update x and y iimits 
for aii axes (subpiots) in the current figure. 


Plotting data on a map using Basemap 


Probabiy the best geospatiai visuaiizations are done by overiaying the data over the map. 
Whether the whoie giobe, a continent, a state, or even the sky, it is one of the easiest ways for a 
viewer to comprehend the reiationship between the data and the geography it has dispiayed. 
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In this recipe, you will learn how to project data on a map using matplotlib's Basemap toolkit. 


Getting ready 


As we are aiready familiar with matplotiib as our plotting engine, we can extend that to 
matplotlib's capability to use othertoolkits, one such example beingthe Basemap 
mapping toolkit. 

Basemap itself doesn't do any plotting. Itjusttransforms given geospatiai coordinates to map 
projection and gives that data to matplotiib for plotting. 

First, we need to install the Basemap toolkit. If you are using EPD, Basemap is aiready 
installed. If you are on Linux, it is bestto use native package managersto install the package 
containing Basemap. On Ubuntu, for example, the package is called python-mpltoolkits. 
basemap and can be installed using Standard package manager: 

$ sudo apt-get install python-mpltoolkits.basemap 

On Mac OS X, it is recommended to use EPD, although installation using popular package 
managers such as Homebrew, Fink, and pip is also possible. 


How to do it... 


Flere is an example of how to use the Basemap toolkit to plot simple Mercator projection 
within a specific region, specified by longitude, latitude coordinate pairs: 

1. We instantiate Basemap defining the projection to be used (mere for Mercator). 

2. We define (in the same Basemap constructor) longitude and latitude for the lower-left 
and upper-right corners of a map. 

3. We set up the Basemap instance map, to draw coastlines and countries. 

4. We set up the Basemap instance map to fili continents and draw the map boundary. 

5. We instruet the Basemap instance map to draw meridians and parallels. 

The following code shows how to use Basemap toolkit to plot a simple Mercator projection: 

from mpl_toolkits.basemap import Basemap 
import matplotiib.pyplot as plt 
import numpy as np 

map = Basemap(projection='mere', 
resolution = 'h', 
area_thresh = 0.1, 

llcrnrlon=-126.619875, llcrnrlat=31.354158, 
urcrnrlon=-59.647219, urcrnrlat=47.517613) 


{iUl- 





Plotting Charts with Images and Maps 


map.drawcoastlines() 
map.drawcountries() 

map.fillcontinents(color='coral', lake_color='aqua') 
map.drawmapboundary(fill_color='aqua') 

map.drawmeridians(np.arange(0, 360, 30)) 
map.drawparallels(np.arange(-90, 90, 30)) 

plt.Show() 

This will give a recognizable portion of our globe: 



Now that we know how to plot a map, we need to know how to plot data on top of this map. 

If we recall that Basemap is a big transcoder of longitude and latitude pairs into current map 
projections, we will recognize that ali we need is a dataset that contains longitude and latitude 
that we can pass to Basemap for projecting, before plotting over with matplotiib. We use the 
cities . shp and cities . shx files to load the coordinates of US cities and project them 
onto the map. 

The file is provided in the Chapteroe folder of the code repository. Here's the example of how 
to achieve this: 

from mpl_toolkits.basemap import Basemap 
import matplotiib.pyplot as plt 
import numpy as np 

map = Basemap(projection='mere', 
resolution = 'h', 
area_thresh = 100, 

llcrnrlon=-126.619875, llcrnrlat=25, 
urcrnrlon=-59.647219, urcrnrlat=55) 


shapeinfo = map.readshapefile('citiescities') 
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X, y = zip(*map.cities) 

# build a list of US cities 
city_names = [] 

for each in map.cities_info; 

if each['COUNTRY'] != 'US': 

city_names.append("") 
else: 

city_names.append(each['NAME']) 

map.drawcoastlines() 
map.drawcountries() 

map.fillcontinents(color='coral', lake_color='aqua') 
map.drawmapboundary(fill_color='aqua') 
map.drawmeridians(np.arange(0, 360, 30)) 
map.drawparallels(np.arange(-90, 90, 30)) 

# draw City markers 

map.scatter(x,y,25, marker='o',zorder=10) 

# plot labeis at City coords. 

for city_label, city_x, city_y in zip(city_names, x, y): 
plt.text(city_x, city_y, city_label) 

plt.title('Cities in USA') 

plt.Show() 


How it Works... 


The basies of Basemap usage consists of importingthe main module and instantiating a 
Basemap class with desired properties. What we must specify during instantiations are the 
projections to be used and the portion of the globe that we want to work with. 

Additional configuration can be applied before drawingthe map and displayingthe figure 
window with matplotlib. pyplot. show (). 

More than a dozen (or 32, to be precise) different projections are supported in Basemap. 
Most of them are very narrow-usage oriented, but some are more general and apply to most 
common map visualizations. 

We can easily see what projections are available by asking the Basemap module itself: 

import mpl_toolkits.basemap 

print mpl_toolkits.basemap.supported_proj ections 
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mbtfpq 

McBryde-Thomas Flat-Polar Quartic 

aeqd 

Azimuthal Equidistant 

sinu 

Sinusoidal 

poly 

Polyconic 

omerc 

Oblique Mercator 

gnom 

Gnomonic 

moli 

Mollweide 

Icc 

Lambert Conformal 

tmerc 

Transverse Mercator 

nplaea 

North-Polar Lambert Azimuthal 

gall 

Gall Stereographic Cylindrical 

npaeqd 

North-Polar Azimuthal Equidistant 

mill 

Miller Cylindrical 

mere 

Mercator 

stere 

Stereographic 

eqdc 

Equidistant Conic 

cyl 

Cylindrical Equidistant 

npstere 

North-Polar Stereographic 

spstere 

South-Polar Stereographic 

hammer 

Hammer 

geos 

Geostationary 

nsper 

Near-Sided Perspective 

eck;4 

Eckert IV 

aea 

Albers Equal Area 

kavV 

Kavrayskiy VII 

spaeqd 

South-Polar Azimuthal Equidistant 

ortho 

Orthographia 

cass 

Cassini-Soidner 

vandg 

van der Grinten 

laea 

Lambert Azimuthal Equal Area 

splaea 

South-Polar Lambert Azimuthal 

robin 

Robinson 


Usually, we plot the whole projectiori, if nothing is specified, and some reasonabie defauits 
are used. 
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To zoom in on a specific region of the map, we specify the latitude and longitude of the lower- 
left and upper-right corners of the region you want to show. For this exampie, we wiii use the 
Mercator projection. 



Here we see how the arguments' names are shortened descriptions: 
llcrnrlon: This is lower-left corner longitude 
llcrnrlat: This is lower-left corner latitude 
urcrnrlon: This is upper-right corner longitude 
urcrnrlat: This is upper-right corner latitude 


There's more... 


We have just scratched the surface of the capabilities of Basemap toolkit. More examples can 
be found in the officiai documentation at http: //matplotlib. org/basemap/users/ 
examples.html. 

Most of the data used in the examples in the officiai Basemap documentation is located on 
remote servers and in a specific format. To efficiently fetch this data, NetCDF data format is 
used. NetCDF is a common data format designed with network efficiency in mind. It allows a 
program to fetch as much data as is needed, even when the whole dataset is very large, which 
makes using this format very practical. We don't have to download and store large datasets 
locally every time we want to use them and every time they change. 


Plotting data on a map using the Googie 
Map API 


In this recipe, we will diverge from the desktop environment and show how we can output for 
the Web. Although the main language for the web frontend is not Python but HTML, CSS, and 
JavaScript, we can stili use Python for heavy lifting: fetch data, process it, perform intensive 
computatione, and render data in a format(s) suitable for web output, that is, create HTML 
pages with the required JavaScript version to render our visualization(s). 


Getting ready 


We will use Googie Data Visualization Library for Python to help us prepare data for the 
frontend interface, where we will use another Googie Visualization API to render data in the 
desired visualization, that is, a map and a table. 
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Before we start, we need to install the google-visualization-python module. 

Download the latest stable version from GIthub and Install the module. The followlng actions 
demonstrate how to do this: 

$ git clone https://github.com/google/google-visualization-python.git 
$ cd google-visualization-python/ 

$ sudo python setup.py install 

Note that we have to become a super user (that Is, galn administrator privileges) to Install thIs 
module on our system. 

A better option, If you don't want to pollute your OS packages, Is to create a virtualenv 
environment to Install the packages just for this recipe. We explalned how to deal wIth 
virutalenv environments In Chapter 1, Preparing Your Working Environment. 

For the frontend library we don't have to Install anything, as that library wlll be loaded from the 
web page directiy from the Googie servers. 

We need active access to the Internet for this recipe, because the output of It wlll be a web 
page that wlll, when opened In a web browser, puli the JavaScript librarles directiy from 
remote servers. 

In this recipe, you wlll learn howto use Googie Data VIsualIzatlon Library for Python and 
JavaScript to combine them for creatlng web visualizatlon. 


How to do it... 


The followlng example shows how to visualize Disposable Median Monthly Salary per 

Country on the world map projection using Googie Geochart and Table Visualizatlon, loading 
the data from a . csv file using Python and the gdata_viz module. We wlll: 

1. Implement a function to act as a template generator. 

2. Use the csv module to load the data from the local . csv file. 

3. Use DataTable to describe the data and LoadData to load the data from the 
Python dictionary. 

4. Render the output to a web page. 

This can be achleved with the followlng code: 

import csv 
import gviz_api 


def get_page_template(); 
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page_template = """ 

<html> 

<script src="https://www.google.com/jsapi" type="text/ 
j avascript"></script> 

<script> 

google.load('visualization', '1', {packages:['geochart', 

'table']}); 

google.setOnLoadCallback(drawMap); 
function drawMap() { 

var json_data = new google.visualization.DataTable(%s, 

0 . 6 ) ; 


var options = {colorAxis: {colors: ['#eee', 'green']}}; 

var mymap = new google.visualization.GeoChart( 

document.getElementByld('map_div')); 
mymap.draw(json_data, options); 

var mytable = new google.visualization.Table( 

document.getElementByld('table_div')); 
mytable.draw(json_data, {showRowNumber: true}) 

} 

</script> 

<body> 

<Hl>Median Monthly Disposable Salary World Countries</Hl> 


<div id="map_div"></div> 
<hr / > 

<div id="table_div"></div> 


<div id="source"> 

<hr / > 

<small> 

Source: 

<a href="http;//www.numbeo.com/cost-of-living/prices_by_ 
country.j sp?displayCurrency=EUR&itemId=105"> 

http://www.numbeo.com/cost-of-living/prices_by_country.j sp?dis 
playCurrency=EUR&:itemId=105 
</a> 

</small> 

</div> 

</body> 

</html> 

fl It tl 

return page_template 


{isT]- 
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def main () : 

# Load data from CVS file 

afile = "median-dpi-countries.csv" 
datarows = [] 

with open(afile, 'r') as f: 
reader = csv.reader(f) 
reader.next() # skip header 

for row in reader; 

datarows.append(row) 

# Describe data 

description = {"country": ("string", "Country"), 

"dpi": ("number", "EUR"), } 

# Build list of dictionaries from loaded data 
data = [] 

for each in datarows: 

data.append({"country": each[0], 

"dpi": (float(each[1]), each[l])}) 

# Instantiate DataTable with structure defined in 'description' 
data_table = gviz_api.DataTable(description) 

# Load it into gviz_api.DataTable 
data_table.LoadData(data) 

# Creating a JSon string 

json = data_table.ToJSon(columns_order=("country", "dpi"), 

o rde r_by="c ount ry", ) 

# Put JSON string into the template 

# and save to output.html 

with open('output.html', 'w') as out: 

out.write(get_page_template() % (json,)) 

if _name_ == '_main_' ; 

main() 


-[ 182 } 
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This will produce the output. html file, which we can open in our favorite web browser. 
The page shouid look like the following screenshot: 



How it Works... 


The main entry point here is our main () function. First, we use the csv module to load our 
data. This data is obtained from the public website www.numbeo. com, and the data is put in 
the . csv format. The final file is available in the repository for this chapter in the Chapteroe 
folder. To be able to use Googie Data Visualization Library, we need to describe the data to it. 
We describe data using the Python dictionaries, where we define the ID of the columns, their 
data type, and an optional label. In the following example, the data is defined in this constraint: 

{"name": ("data_type", "Label")}: 

description = {"country": ("string", "Country"), 

"dpi": ("number", "EUR"), } 
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Then we need to fit our loaded . csv rows in this format. We wiii buiid a iist of dictionaries in 
the data variabie. 

Now we have everythingto instantiate our data_table with gviz_data .DataTable with 
the described structure. We then ioad the data into it and output in the JSON format to our 
page_template. 

The get_page_template () function contains the other part of this equation. It contains 
a ciient (frontend) code to produce an HTML web page and a JavaScript code to ioad Googie 
Data Visuaiization Library from Googie servers. The iine that ioads the Googie JavaScript 
API is: 

<script src="https://www.googie.com/j sapi" 
type="text/j avascript"></script> 

After this foiiows another pair of <script>. . . </script> tags that contains an additionai 
Setup. First, we ioad Googie Data Visuaiization Library and the required package—geochart 
and tabie: 

googie.Ioad('visuaiization', '1', {packages:['geochart', 

'tabie']}); 

Then we set up a function that wiii be caiied when the pages are ioaded. This event in the web 
worid is registered as onLoad, so caiiback is set up via setOnLoadCallback function: 

googie.setOnLoadCallback(drawMap); 

This defines that when a page is ioaded, the Googie instance wiii caii the custom function 
drawMap () that we defined. The drawMap function ioads a JSON string into the JavaScript 
version of the DataTable instance: 

var json_data = new googie.visuaiization.DataTable(%s, 0.6); 

Foiiowingthat, we create a geochart instance in an HTML eiement with the ID map_div: 

var mymap = new googie.visuaiization.GeoChart( 
document.getElementByld('map_div')); 

Draw the map using j son_data and provided custom options: 

mymap.draw(json_data, options); 

Similarly, Google's JavaScript tabie is rendered beiowthe map: 

var mytable = new googie.visuaiization.Tabie( 
document.getElementByld('table_div')); 
mytable.draw(json_data, {showRowNumber: true}) 
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We save this output as an HTML file that we can open in a browser. This is not so usefui for 
the dynamic rendering of a web Service. There is a better option for this—to output the HTTP 
response directiyfrom Python, and thus build a background Service respondingto Client web 
requests with JSON that a Client can load and render. 


If you wantto understand more on reading HTTP responses, please 
read more on HTTP ProtocoI and Response messages at http: // 
en.wikipedia.org/wiki/Hypertext_Transfer_ 
Protocol#Response_message. 

We do this by replacing the ToJson () call with the ToJSonResponse () with the same 
signature. This call will respond with a proper HTTP response containingthe payioad—our 
JSON-ified data_table ready to be consumed by our JavaScript Client. 



There's more... 


This, of course, is just one example of how we can combine Python as a backend language, 
sitting on our server, doingthe data fetch and Processing, while the frontend is leftto the 
universal HTML/JavaScript/CSS set of languages. This enables us to provide Interactive and 
dynamic interfaces with visualizations to a wide audience without requiring them to install 
anything (well, apart from a web browser, but that is usually installed on a computer or 
smartphone). Saying that, we must note that the quality of these outputs is not as high 
as that of matplotiib; the strength of matplotiib lies in its high-quality output. 

To Work more with the web (and Python), you wouid have to leam more about the web 
technologies and languages used. This book does not cover such topics but does give an 
insight into how to achieve one possible solution using well-known third-party libraries that 
produce pleasing web outputs, with as littie web coding as possible. 

More documentation is available on the GoogIe Developer portal at 
https://developers.google.com/chart/interactive/docs/ 
dev/gviz_api_lib. 



Generating CAPTCHA images 


Although this is not strictiy data visualization in usual terms, the ability to generate images 
using Python comes in handy in many cases, and this is one of them. 

In this recipe, we will be covering the generation of random images to teli humans and 
computers apart—CAPTCHA image. 
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Getting ready 


CAPTCHA stands for Completely Automated Public Turing test to teli Computers and 
Humans Apart, and is trademarked by Carnegie Mellon University. This test is used to 
challenge computer programs (usually referred to as bots) that automatically fili various 
web forms that are primarily targeted at humans and that shouid not be automated. Usual 
examples are sign-up forms, login forms, surveys, and similar. 

CAPTCHA itself can take various forms, but the most common form consists of a challenge 
where a human shouid read an image with distorted characters and numbers and type in the 
resuit in the related response field. 

In this recipe, you will learn howto harness Python's Imaging Library to generate images, 
render lines and points, and also render text. 


How to do it... 


We will Show you what is involved in creating a personal and simple CAPTCHA generator by 
performingthe following steps: 

1. Define size, text, font size, background color, and CAPTCHA length. 

2. Pick random characters from the English alphabet. 

3. Draw those on the image using defined font and colors. 

4. Add some noise in the form of lines and ares. 

5. Return the image object to the caller together with the CAPTCHA challenge. 

6. Show the generated image to the user. 

The following code shows how to create a personal and simple CAPTCHA generator: 

from PIL import Image, ImageDraw, ImageFont 
import random 
import string 


class SimpleCaptchaException(Exception) : 
pass 


class SimpleCaptcha(object) ; 

def _init_(self, length=5, size=(200, 100), fontsize=36, 

random_text=None, random_bgcolor=None): 
self.size = size 
self.text = "CAPTCHA" 
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self.fontsize = fontsize 
self.bgcolor = 255 
self.length = length 

self.image = None # current captcha image 
if random_text: 

self.text = self._random_text() 
if not self.text: 

raise SimpleCaptchaException("Field text must not be 

empty.") 

if not self.size: 

raise SimpleCaptchaException("Size must not be empty.") 
if not self.fontsize: 

raise SimpleCaptchaException("Eont size must be defined.") 

if random_bgcolor: 

self.bgcolor = self._random_color() 

def _center_coords(self, draw, font): 

width, height = draw.textsize(self.text, font) 

xy = (self.size [0] - width) / 2., (self.size [1] - height) / 2. 

return xy 

def _add_noise_dots(self, draw): 
size = self.image.size 

for _ in range(int(size[0] * size[1] * 0.1)): 

draw.point((random.randint(0, size[0]), 
random.randint(0, size [1])), 
fill="white") 

return draw 

def _add_noise_lines(self, draw): 
size = self.image.size 
for _ in range(8) : 

width = random.randint(1, 2) 
start = (0, random.randint(0, size [1] - 1)) 
end = (size[0], random.randint(0,size[1]-1)) 
draw.line([start, end], fill="white", width=width) 
for _ in range(8): 

start = (-50, -50) 
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end = (size[0] + 10, random.randint(0, size[l]+10)) 
draw.arc(start + end, 0, 360, fill="white") 
return draw 

def get_captcha(self, size=None, text=None, bgcolor=None): 
if text is not None; 

self.text = text 
if size is not None; 

self.size = size 
if bgcolor is not None; 

self.bgcolor = bgcolor 

self.image = Image.new('RGB', self.size, self.bgcolor) 

# Note that the font file must be present 

# or point to your OS's System font 

# Ex. on Mac the path should be '/Library/Fonts/Tahoma.ttf' 
font = ImageFont.truetype('fonts/Vera.ttf', self.fontsize) 
draw = ImageDraw.Draw(self.image) 

xy = self._center_coords(draw, font) 

draw.text(xy=xy, text=self.text, font=font) 

# Add some dot noise 

draw = self._add_noise_dots(draw) 

# Add some random lines 

draw = self._add_noise_lines(draw) 

self.image.show() 

return self.image, self.text 


def _random_text(self); 

letters = string.ascii_lowercase + string.ascii_uppercase 

random_text = "" 

for _ in range(self.length); 

random_text += random.choice(letters) 
return random text 


def _random_color(self); 
r = random.randint(0, 
g = random.randint(0, 
b = random.randint(0, 
return (r, g, b) 
if name == " main "; 


255) 

255) 

255) 


main 
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sc = SimpleCaptcha(length=7, fontsize=36, random_text=True, 
random_bgcolor=True) 
sc.get_captcha() 

This produces an image similar to the following: 



How it Works... 


This example shows a process for using Python's imaging library to generate predefined 
images, to create a simple, yet effective, CAPTCHA generator. 

We wrapped the functionality into one class SimpleCaptcha, because it gives us a safe 
space for future development. We also created a custom simpleCaptchaException to 
accommodate future exception hierarchies. 



If you are writing anything more than simple and quick Scripts, it is always 
good to start writing and designing custom exception hierarchies for your 
domain, rather than using generic Python's Standard exceptions. You will 
gain a lot in the readability and maintenance of the Software. 


Start readingfrom the main section. At the end of the code listing, we instantiate class giving 
settings of our future image as arguments to the constructor. Following that, we call the 
get_captcha method on the sc object. For this recipe's purposes, get_captcha shows 
the image object as a resuit, but we also return the image object to the potentiai caller of this 
method so it couid make use of the resuit. The usage can vary; the caller couid either save 
the image on the file, or if this was a web application, return the image stream and written 
challenge to the Client requesting this CAPTCFIA. 

The important thing to note is that in order to finish the challenge-response process of the 
CAPTCFIA test, we must return the CAPTCFIA string generated on the image as text so that the 
caller can compare the user's response with the expected values. 

The get_captcha method first verifies the input arguments, in order to override the class 
defaults if the user provides custom values. After that, a new image object is instantiated by 
Image. new. This object is saved in self . image, where we use it to draw and write text. 
Flaving written the text to the image, we add the noise of randomly placed points and lines, 
as well as some arc segments. 
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These tasks are carried out by the _add_noise_points and _add_noise_lines 
methods. The first one loops a few times and adds a point to a random location on the image, 
not too close to the edges of the image, and the iatter one draws iines from the ieft-hand side 
of the image to the right-hand side of the image. 


There's more... 


We constructed this ciass usingsome assumptions about its use. We assumed thatthe user 
wiii just wantto accept our defauit settings (that is, a random seven characters on a random 
background eoior) and receive the resuit from it. That is the reasoning behind piacing heiper 
functions in the constructor to set random textand random background eoior. If the most 
frequent and effective usage is to aiways override configuration, then we want to remove 
these operations from the constructor and piace them in separate caiis. 

For exampie, maybe a user wants to aiways use Engiish words as the CAPTCHA chaiienge. 

If this is the case, we wantto be abie to just caii a method to provide us with resuits iike that. 
This method couid be get_english_captcha and with the random iogic of this constructor, 
we wouid then construet that method to pick random words from the provided Engiish 
dictionary. On a Unix system, there is a common Engiish dictionary inside /usr/share/ 
dict/words that we couid use for this: 

def get_english_captcha(self): 

words = '/usr/share/dict/words' 
with open(words, 'r') as wf; 
words = wf.readlines() 
aword = random.choice(words) 

aword = aword.strip() # remove newline and spaces 

return self.get_captcha(text=aword) 

Overaii, the exampie of the CAPTCHA generation is not production quaiity and shouid not be 
used withoutadding more protection and randomness, such as ietter rotation. 

If you need to protect your web forms from bots, there are aiready third-party Python modules 
and libraries that you couid use. There are even specialized modules buiit for the existing web 
frameworks. 

There are event web Services such as reCAPTCHA (http: //www.google. com/recaptcha) 
with an aiready proven Python module recaptcha-client (https : //pypi . python. org/ 
pypi/recaptcha-client) that you can sign up and use. It does not require any imaging 
libraries because the image is pulled directiy from the reCAPTCHA web Service, but it has 
other dependencies such as pycrypto. Using this web Service and library, you are also helping 
books scanned using Optical Character Recognition (OCR) from the Googie Books project or 
old editions of The New York Times. Read more on the reCAPTCHA website. 
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Using the Right Plots 
to Understand Data 


In this chapter, you will cover the following recipes: 

► Understanding logarithmic plots 

► Understanding spectrograms 

► Creatingstem plot 

► Drawing streamlines of vector flow 

► Using colormaps 

► Using scatter plots and histograms 

► Plotting the cross correlation between two variables 

► Importance of autocorrelation 


Introduction 


In this chapter, we will focus more on understanding what we want to say with the data that 
we are presenting, and how to say it effectively. We will present some new techniques and 
plots, but ali will be underlined by understanding of the Information we want to convey to the 
User. Let's ask the questions. Why do we want to present Information in this state? This is the 
most important question that shouid be asked during the data exploration phase. If we miss 
the opportunity to understand the data and present it in a certain way, the viewer, then, is not 
goingto understand the data correctiy, for sure. 
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Understanding logarithmic plots 


More often than not, while reading daily newspapers and similar articles, one can find charts 
that are used by media organizations to misrepresent the facts. One usual example is using 
iinear scaies to create, so caiied, panic charts where constantiy growing vaiue is foiiowed 
for iong period of time (years) and starting vaiues are smaiier from iatest one by severai 
magnitudes. These vaiues when visuaiized correctiy, wouid (and usuaiiy shouid), produce 
iinear or aimost iinear charts. This takes some panic out of the articies they iiiustrate. 


Getting ready 


With the iogarithmic scaie, the ratio of consecutive vaiues is constant. This is important when 
we are trying to read iog piots. With iinear (arithmetic) scaies, the constant is the distance 
between consecutive vaiues. In other words, iogarithmic piots have constant distance in 
orders of magnitudo. We wiii see this iiiustrated on the foiiowing piots. The code used to 
produce this figure is expiained here. 

As a generai ruie of thumb, iogarithmic scaies shouid be used when the data presented has 
the foiiowing: 

► vaiues that span severai orders of magnitudo 

► skewness toward iarge vaiues (some points are much iarger than the rest of the data) 

► you want to show the rate of change (growth rate) and not vaiue of change 

Don't biindiyfoiiow these ruies, they are more iike hints than ruies. Aiways use your own 
judgment about the data in hand and requirements presented to you by the project or customer. 

Depending on the data range, different iog bases shouid be used. The Standard base for the 
iog is 10, but if the range of the data is smaiier, a base of 2 can prove to be more usefui as it 
wiii Show more "resoiution" within the smaiier range. 

If we have the range of data suitable for display on logarithmic scaies, we wiii note that the 
vaiues previously beingtoo close to judge any difference are now well apart. This allows us to 
read the chart much easily than if we wouid present the data in Iinear scaie. 

The growth rate charts, where long-range time series data is collected, are where we wantto 
see, not the absolute vaiue measured at time point, but the growth in time. We wiii stili getthe 
absolute vaiue Information, but that Information is of lower priority. 

Also, if the data distribution has positive skew (for example, salaries), takingthe logarithm of 
the vaiue (salary) wiii help us fit the data into the modei, as the logarithm transformation wiii 
give us more normal data distribution. 
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How to do it... 


We will exemplify this with a sample code that shows the same two dataset (one linear and 
one logarithmic in nature) on two different piots (in the same figure) using different scaies 
(iinearand iogarithmic). 

We wiii be performing the foiiowing steps with the heip of the code mentioned after the steps: 

1. Generate two simpie datasets, y—exponentiai/iogarithmic in nature, and z—iinear 
in nature. 

2. Create figure containing grid of four subpiots. 

3. Create two subpiots containing the y dataset one in iogarithmic scaie and one in 
iinear scaie. 

4. Create another two subpiots containing z dataset, again, one iogarithmic and the 
other iinear. 

Here is the code: 

from matplotlib import pyplot as plt 
import numpy as np 

X = np.linspace(1, 10) 
y = [10 ** el for el in x] 
z = [2 * el for el in x] 

fig = plt.figure(figsize=(10 , 8)) 

axi = fig.add_subplot(2, 2, 1) 
axl.plot(x, y, color='blue') 
axi.set_yscale('log') 

axi.set_title(r'Logarithmic plot of $ {l0}^{x} $ ') 

axi.set_ylabel(r'$ {y} = {l0}^{x} $') 

plt.grid(b=True, which='both', axis='both') 


ax2 = fig.add_subplot(2, 2, 2) 
ax2.plot(x, y, color='red') 
ax2.set_yscale('linear') 

ax2.set_title(r'Linear plot of $ {l0}^{x} $ ') 

ax2.set_ylabel(r'$ {y} = {l0}^{x} $') 

plt.grid(b=True, which='both', axis='both') 


{iUi- 
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ax3 = fig.add_subplot(2, 2, 3) 
ax3.plot(x, z, color='green') 
ax3.set_yscale('log') 

ax3.set_title(r'Logarithmic plot of $ {2}*{x} $ ') 

ax3.set_ylabel(r'$ {y} = {2}*{x} $') 

plt.grid(b=True, which='both', axis='both') 

ax4 = fig.add_subplot(2, 2, 4) 
ax4.plot(x, z, color='magenta') 
ax4.set_yscale('linear') 

ax4.set_title(r'Linear plot of $ {2}*{x} $ ') 

ax4.set_ylabel(r'$ {y} = {2}*{x} $') 

plt.grid(b=True, which='both', axis='both') 


plt.Show() 

This code will produce the following figure: 






How it Works... 


We generate some sample data and two dependent variables— y and z. Variable y is expressed 
as exponentiai function of data (x), and variable z is simple linear function of x. This heips us 
illustrate different looks of linear and exponentiai charts. 
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We then create grid of four subpiots, where the top row subpiots are of data (x, y) and bottom 
row are of data (x, z) pairs. 

Lookingfrom left-hand side, columns charts have logarithmic scales on they-axis, while 
right-hand side coiumns are in iinear scaie. We set this using set_yscale (' log ') for 
each axis separateiy. 

For every subpiot, we set a titie and iabei, where iabei aiso describes the function piotted. 

With plt. grid (b=True, which= ' both' , axis= ' both '), we turn the grid on for both 
axis and both the major and minor ticks. 

We observe how iinear functions are straight iines on iinear piots, whiie iogarithmic functions 
are straight iines on iogarithmic piots. 


Understanding spectrograms 


A spectrogram is a time-varying spectrai representation that shows how the spectrai density 
of a signai varies with time. 

It represents a spectrum of frequencies of the sound or other signai in a visuai manner. 

It is used in various Science fieids, from sound fingerprinting iike voice recognition to radar 
engineeringand seismoiogy. 

Usuaiiy spectrogram iayout is as foiiowing: x-axis represents time, y-axis represents frequency, 
and the third dimension is ampiitude of a frequency-time pair, which is eoior coded. This is 
three-dimensionai data, therefore, we can aiso create 3D piot where the intensity is represented 
as height on the z-axis. The probiem with 3D charts is that humans are bad at understanding 
and comparingthem. Aiso, they tend to take more space than 2D charts. 


Getting ready 


For serious signai Processing, we wouid go into iow ievei detaiis to be abie to detect patterns 
and auto fingerprint certain specific, but for this data visuaiization recipe we, wiii ieverage a 
coupie of weii-known Python iibraries to read in audio fiie, sampie it, and spot a spectrogram. 

In order to read . wav files to visualize sound, we need to do some prep work. We need to 
install the libsndf ilei System library for reading/writing audio files. This is done via the 
favorite package manager. For Ubuntu, you can use: 

$ sudo apt-get install libsndfilel-dev. 

It is important to install the dev package, which contains header files so pip can build the 
scikits. audiolab module. 


{ 195 ]- 
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We can also install libasound, ALSA (Advanced Linux Sound Architectare) headers to 
avoid the runtime warning. This is optional, as we are not going to use features provided 
by the ALSA library. For Ubuntu, Linux issue the following command: 

$ sudo apt-get install libasound2-dev 

To install scikits . audiolab, which we will use to read . wav files, we will use pip: 

$ pip install scikits.audiolab 

Always rememberto enterthe Virtual environmentforyour 
current project, as you don't want to dirty system libraries. 


How to do it... 


For this recipe, we will use prerecorded sound file test. wav that can be found in the file 
repository with this book. But we couid also generate a sample, which we will try later. 

In this following example, we perform the following steps in this order: 

1. Read the . wav file that contains recorded sound sample 

2. Define the length of the window used for Fourier transform— nfft 

3. Define the overlapping data points while sampling— noverlap 



NFFT defines the number of data points used for computingthe Discrete 
Fourier Transform in each block. The most efficient computation is then 
the NFFT is the power of two. The Windows can overlap and the number 
of data points that are overlapped (that is, repeated) is defined by the 
noverlap argument. 


import os 

from math import floor, log 

from scikits.audiolab import Sndfile 
import numpy as np 

from matplotlib import pyplot as plt 
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# Load the sound file in Sndfile instance 
soundfile = Sndfile("test.wav") 


# define start/stop seconds and compute start/stop frames 
start_sec = 0 

stop_sec = 5 

start_frame = start_sec * soundfile.samplerate 
stop_frame = stop_sec * soundfile.samplerate 

# go to the start frame of the sound object 
soundfile.seek(start_frame) 

# read number of frames from start to stop 
delta_frames = stop_frame - start_frame 
sample = soundfile.read_frames(delta_frames) 


map = ' CMRmap ' 

fig = plt.figure(figsize=(10, 6), ) 
ax = fig.add_subplot(111) 

# define number of data points for FT 
NFFT = 128 

# define number of data points to overlap for each block 
noverlap = 65 

pxx, freq, t, cax = ax.specgram(sample, Fs=soundfile.samplerate, 

NFFT=NFFT, noverlap=noverlap, 
cmap=plt.get_cmap(map)) 

plt. colorbar(cax) 

plt.xlabel("Times [sec]") 

plt.ylabel("Frequency [Hz]") 

plt.Show() 
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This generates the following spectrogram, with visible "white-like" traces for separate notes. 



How it Works... 


We need to load a sound file first. To do this, we use the scikits . audiolab. SndFile 
method and provide it with a filename. This will instantiate sound object, which we can then 
query for data and call function on. 

To read data needed for spectrogram, we need to read the desired frames of data from our 
sound object. This is done by read_f rames (), which accepts the start and end frame. We 
calculate the frame number by multiplying sample rate with the time points (start, end) we 
want to visualize. 


There's more... 


If you can'tfind audio (wave), you can easily generate one. Here's how to generate it: 

import numpy 


def _get_mask; (t, tl, t2, lvl_pos, lvl_neg) : 
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if tl >= t2 ; 

raise ValueError("tl must be less than t2") 

return numpy.where(numpy.logical_and(t > tl, t < t2), lvl_pos, 
lvl_neg) 


def generate_signal(t): 

sini = numpy.sin(2 * numpy.pi * 100 * t) 
sin2 = 2 * numpy.sin(2 * numpy.pi * 200 * t) 

# add interval of high pitched signal 
sin2 = sin2 * _get_mask(t, 2, 5, 1.0, 0.0) 

noise = 0.02 * numpy.random.randn(len(t)) 
final_signal = sini + sin2 + noise 
return final_signal 


if _name_ == '_main_' ; 

step = 0.001 
sampling_freq=1000 
t = numpy.arange(0.0, 20.0, step) 
y = generate_signal(t) 

# we can visualize this now 

# in time 

axi = plt.subplot(211) 
plt.plot(t, y) 

# and in frequency 
plt.subplot(212) 

plt.specgram(y, NFFT=1024, noverlap=900, 

Fs=sampling_freq, cmap=plt.cm.gist_heat) 

plt.Show() 
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Will give you the following signal where the top subpiot represent the signal we generated. 
Here, the X axis is time and Y axis is the signai's ampiitude. The bottom subpiot is the same 
signai in the frequency domain. Here, whiie x-axis is the same time as in the top subpiot (we 
matched the time by seiecting the sampiing rate), the y-axis is the frequency of the signai. 



Creating stem plot 


A two-dimensionai stem piot dispiays data as iines extending from a baseiine aiong the 
x-axis. A circie (the defauit) or the other markeds y-position represents the data vaiue 
thatterminates each stem. 

In this recipe, we wiii be discussing about how to create a stem piot. 

Do not confuse stem with stem and ieaf piots, which is a method of representing data by 
separatingthe iast important digit of vaiues as ieaves and higher order vaiues as stems. 
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steam | leaf 
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6 7 8 

023477789 
1 3 4 4 5 7 
3 1 1 2 6 6 9 
1 5 5 6 9 
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Getting ready 


For this kind of plot, we need to use a sequence of discrete data, where ordinary an line plots 
wouldn't make sense anyway. 

Plot discrete sequences as stems, where data values are represented as markers atthe end 
of each stem. Stems extend from baseline (usually at y=0) to the data point value. 


How to do it... 


We will use matplotiibto plot stem plots usingthe stemO function. This function can usejust 
a series of y values when x values are generated as a simple sequence from o to len (y) -i. 
If we provide the stem function with both x and y sequences, they will be used for both axes. 

What we want to configure with stem plot is several formatters: 

► linefmt: This is the line formatter for stem line 

► markerfmt: The stems at the end of the line is formatted using this argument 

► basef mt: This formats the look of the base line 

► label: This defines label for legend for stem plot 

► hold: This holds the current graphs on current axis 

► bottom: This sets up the location of baseline position on y axis, default value is o 

The hold argument is used as a usual feature for plots. If it is on (True), ali the following 
plotting is added to the current axes. Otherwise, every plot will create a new figure and axes. 

To create a stem plot, perform the following steps: 

1. Generate random noise data 

2. Configure stem options 

3. Plot the stem 
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Here is the cede to do it: 

import matplotlib.pyplot as plt 
import numpy as np 


# time domain in which we sample 
X = np.linspace(0, 20, 50) 

# random function to simulate sampled signal 
y = np.sin(x + 1) + np.cos(x ** 2) 


# here we can setup baseline position 
bottom = -0.1 

# True -- hold current axes for further plotting 

# False -- opposite, ciear and use new figure/plot 
hold = False 

# set label for legend. 
label = "delta" 

markerline, stemlines, baseline = plt.stem(x, y, bottom=bottom, 

label=label, hold=hold) 


# we use setpO here to setup 

# multiple properties of lines generated by stem() 
plt.setp(markerline, color='red', marker='o') 

plt.setp(stemlines, color='blue', linestyle=':') 

plt.setp(baseline, color='grey', linewidth=2, linestyle='-') 

# draw a legend 
plt.legend() 


plt.Show() 
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This code produces the following plots: 



How it Works... 


First, we need some data. For this recipe, the generated sampied pseudo-signai wiii suffice. In 
reai worid, any discrete sequentiai data can be properiy visuaiized using stem piot. We generate 
this signai using Numpy's numpy. linspace, numpy. cos, and numpy. sin functions. 

We then set up iabei for stem piot and position of baseiine, which defauits to 0. 

If we want to draw muitipie stem piots, we wiii set hoid to True, and the foiiowing piotting caiis 
wiii be rendered over the same set of axes. 

Caii to a matplotlib. stem returns three objects. First is markerline, instance of 
Line 2 D. This hoids the reference to a iine representing stems themseives, rendering oniy 
markers and not the iine connectingthe markers. This iine can be made visibie by the editing 
property of the Line2D instance, which we wiii expiain soon. The iast one is aiso a Line2D 
instance— baseiine, hoiding a reference to a horizontai iine that represents the source of aii 
stemlines. Second object returned— stemlines— coiiection (Python iistatthe moment) of 
Line2D instances representing stem-iines, of course. 
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We use these objects returned to manipulate the visual appeal of the stem plot, using 
the setp function to apply properties to all lines (Line2D instances) in those objects or 
collections of objects. 

Experiment with the desired settings untii you understand how setp changes your plofs style. 


Drawing streamlines of vector flow 


Stream plots are used to visualize flow in vector fields. Examples from Science and nature 
include fields of magnetic and gravitational forces or movement of liquid materials. 

Vector field can be visualized in such a way, where we assign a line and one or more arrows 
to every point. The intensity can be represented by the line length, and direction by arrow 
pointing in particular direction. 

Usually, the intensity of the force is visualized with the length of a particular streamline, but 
density can also be used for the same purpose. 


Getting ready 


To visualize vector fields, we will use matplotlib's matplotlib. pyplot. streamplot 
function. This function creates plots from streamlines of a flow uniformiy filling the domain. 
The velocities field is interpolated and streamlines are integrated. The original source for this 
function is to visualize wind patterns or liquid flow, hence we don't need striet vector lines but 
uniform representation of the vector field. 

Most important arguments for this function are x, y eveniy spaced grid of one-dimensional 
NumPy array, and u, V matehing two-dimensional NumPy arrays of x, y velocities. Matrices 
u and V must be of such dimensions that the number of rows must be of equal length of y, 
and the number of columns must mateh the length of x. 

Line width of stream plot can be controlled per line, if the linewidth argument is given a 
two-dimensional array matehing the shape of u and v velocities, or it, simply can be Just one 
integer value that all lines will accept. 

Color, can not oniy be Just one value for all stream line, but also a matrix shaped like the 
linewidth argument. 

Arrows (the FancyArrowPatch class) are used to indicate vector direction, and we can 
controi them using two params: arrowsize— size of the arrow, and arrowstyle— format of 
the arrow (for example, "simple", "->"). 
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How to do it... 


We will start with a simple example, just to get the sense of whafs going on here. Perform the 
followingsteps: 

1. Create data vectors 

2. Print intermediate values 

3. Plotthe stream-plot 

4. Show the figure with streamlines visualizing our vectors 

Here is the code sample: 

import matplotlib.pyplot as plt 
import numpy as np 

Y, X = np.mgrid[0;5;100j, 0;5;100j] 

U = X 
V = Y 


from pprint import pprint 
print "X" 
pprint(X) 

print "Y" 
pprint(Y) 

plt.streamplot(X, Y, U, V) 


plt.Show() 

This will give the following textual output: 


X 

array( [ [ 0. 

4.94949495, 

[ 0 . 

4.94949495, 

[ 0 . 

4.94949495, 

• • • / 

[ 0 . 


0.05050505, 0.1010101 

5. ] , 

0.05050505, 0.1010101 

5. ] , 

0.05050505, 0.1010101 

5. ] , 

0.05050505, 0.1010101 


4.8989899 , 
4.8989899 , 
4.8989899 , 

4.8989899 , 
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4.94949495, 

5. ] , 





[ 

0. 

0.05050505, 

0.1010101 , 


4.8989899 



4.94949495, 

5. ] , 





[ 

0. 

0.05050505, 

0.1010101 , 


4.8989899 



4.94949495, 

5. ] ] ) 




Y 







array([[ 

0. 

0. 

0 . 


0 . 



0. 

0. ] , 





[ 

0.05050505, 

0.05050505, 

0.05050505, 


0.05050505 



0.05050505, 

0.05050505] , 





[ 

0.1010101 , 

0.1010101 , 

0.1010101 , 


0.1010101 



0.1010101 , 

0.1010101 ], 





[ 

• / 

4.8989899 , 

4.8989899 , 

4.8989899 , 


4.8989899 



4.8989899 , 

4.8989899 ] , 





[ 

4.94949495, 

4.94949495, 

4.94949495, 


4.94949495 



4.94949495, 

4.94949495] , 





[ 

5. 

5. 

5. 


5. 

5. 


, 5. 

] ] ) 





This generates the following streamline flow figure: 
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How it Works... 


We create a vector field of x and y by indexing the two-dimensional mesh grid, using NumPy's 
mgrid instance. We specify the range of the grid, as start and stop (-2 and 2, respectively). 
The third index represents a step iength. The step iength is the number of points to inciude 
between start and stop. If you want to inciude the stop vaiue, use compiex numbers for step 
iength, where magnitude is used for the number of points required between start and stop, 
stop being inciusive. 

Mesh grid, fieshed out iike this, is then used to compute vector veiocities. Here, for the sake 
of exampie, we just use the same meshgrid as vector veiocities. 

This generates a piot that cieariy shows piain iinear dependency and fiow of represented 
vector fieid. 

Piay with the vaiues of u and v to get a sense of how vaiues of u and v infiuence stream piot. 
For exampie, make u = np.sin(x) orv = np. sin (Y) . Foiiowingthat, tryto change the 
start and stop vaiues. The foiiowingfigure shows u = np.sin(x): 



Bare in mind that the piot we piotted is generated by a set of iines and arrow patches; 
hence, there is no way (currentiy, at ieast) to update the existingfigure. Lines and arrows 
know nothing about vectors and fieids. Future impiementations might bring about this change, 
but at the moment, this is a known iimitation in the current version of matpiotiib. 
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There's more... 


Of course this example gives you an opportunity to get to know and understand matplotlib's 
stream plotfeatures and capabilities. 

Real power comes when you have the real data at hand to play with. After understandingthis 
recipe, you will be able to recognize the tools you have. So when you are given the data and 
you know the domain of it, you wiii be abie to pick the best tooi for the job. 


Using colormaps 


Coior coding the data can have great impact on how your visuaiizations are perceived by the 
viewer, as they come with assumptions about coior and what that coior represents. 

Being expiicit if the coior is used to add additionai information to the data is aiways good. 

To know when and how to use coior in your visuaiizations is even better. 


Getting ready 


If your data is not naturaiiy coior coded (such as earth/terrain aititudes or object temperature), 
it's better notto make any artificiai mappings to naturai coioring. We wantto understand the 
data appropriateiy and make a choice of coior to heip the reader decode data easiiy. We don't 
want readers constantiy tryingto suppress iearned mappingof coior for temperaturas, if we are 
representingfinanciai data that has no connection with Keivins or Ceisius. 

If possible, avoid the usual red/green associations, if there are no strong correiations in the 
data to associate them with those coiors. 

To heip you pick the right coior mapping, we wiii expiain some coiormaps avaiiabie in the 
matpiotiib package that can save a iot of time and heip us, if we know what they are used 
for and how to find them. 

Coiormaps, in generai, can be categorized asfoiiows: 

► Sequentiai: Monochromatic coiormaps oftwo coior tones from iowto high saturation 
of the same coior. For exampie, from white to bright biue. Ideai for most cases, 

as they cieariy show the change from iow to high vaiues. 

► Diverging: The centrai point here is the median vaiue (some iight coior usuaiiy), but 
then, ranges go from there to two different coior tones in direction for high and for 
iow vaiues. This can be ideai for data with significant median vaiue. For exampie, 
when the median is at 0, it cieariy shows the difference between negative and 
positive vaiues. 
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► Qualitative: For cases where data has no inherent ordering, and all you want is to 
make sure different categories are easily discernible from each other. 

► Cyclic: It is used where data can wrap around endpoint values, such as representing 
time of the day, wind direction, phase angie, or similar. 

Matplotiib comes with a lot of predefined maps, and we are able to divide them into several 
categories. We will suggest when to use some of these colormaps. 

The most common and base colormaps are autumn, bone, cool, copper, f lag, gray, hot, 
hsv, jet, pink, prism, Sprint, summer, winter, and spectral. 

We have another set of colormaps comingfrom the "Yorick scientific visualization package". 
This is evolution from GIST package, so all colormaps in this collection have gist_ as prefix 


The Yorick is a visualization package and also an interpreted language, 
written in C, not quite active lately. You can find more Information on an 
officiai website - http: //yorick. sourcef orge . net/index. php 

These colormap set contain following maps: gist_earth, gist_heat, gist_ncar, 
gist_rainbow,and gist_stern. 

Then, we have the following colormaps based on Color Brewer (http : //colorbrewer. org), 
where we can categorize them into: Diverging, where luminance is highest at the midpoint and 
decreases towards different endpoints; SequentiaI, where luminance decreases monotonically; 
Qualitative, where different sets of colors are used to differentiate data categories. 

Also, some miscellaneous colormaps are also available: 


in their name. 



Colormap 

Descriptiori 

brg 

This is blue-red-green 

bwr 

Diverging blue-white-red 

coolwarm 

UsefuI for 3D shading, color blindness, and ordering of colors 

rainbow 

Spectral purple-blue-green-yellow-orange-red colormap with diverging 
luminance 

seismic 

Diverging blue-white-red 

terrain 

Mapmakehs colors, blue-green-yellow-brown-white, originally from IGOR 
Pro Software 


Most of the maps presented here can be reversed by putting_r postfix after a name of the 
colormap, for example, hot_r is an inverse cycle colormap of hot. 
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How to do it... 


We can set colormap on many items in matplotlib. For example, colormap can be set 
on image, pcolor, and scatter. This is accompiished usuaiiy via argument to a function 
caiied cmap. This argument is an expected instance of colors . Colormap. 

We can aiso use matplotlib .pyplot. set_cmap to set cmap for iatest object piotted on 
the axes. 

You can get aii avaiiabie coiormaps easiiy with matplotlib.pyplot. colormaps. Fire up 
IPython and type in the foiiowing: 

In [1]: import matplotlib.pyplot as plt 

In [2]: plt.colormaps() 

Out[2]: 

['Accent', 

'Accent_r', 

'Blues', 

'Blues_r', 

'winter', 

'winter_r'] 

Note that we have shortened the preceding iist because it contains around 140 items and 
wouid span across severai pages here. 

This wiii import the pyplot function interface and aiiow us to caii the colormaps function, 
which returnsa iist of aii registered coiormaps. 

Finaiiy, we want to show you how to make a nice iooking coiormap. In the foiiowing exampie, 
we need to: 

1. Use the Coior Brewer website to get divergent coiormap eoior vaiues in the hex format 

2. Generate a random sampie of x and y, where y is cumuiative sum of vaiue (simuiate 
stock price variations) 

3. Appiy customization to scatter piot functions of matpiotiib 

4. Tweak scatter marker iine coior and width to make the piot more readabie and 
pieasant for viewers. 

import matplotlib as mpl 
import matplotlib.pyplot as plt 
import numpy as np 
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# Red Yellow Green divergent colormap 
red_yellow_green = ['#d73027', '#f46d43', '#fdae61', 

'#fee08b', '#d9ef8b', 

'#a6d96a', '#66bd63', '#la9850'] 

sample_size = 1000 

fig, ax = plt.subplots(1) 

for i in range(9): 

y = np.random.normal(size=sample_size).cumsum() 

X = np.arange(sample_size) 

ax.scatter(x, y, label=str(i), linewidth=0.1, 
edgecolors='grey', 

facecolor=red_yellow_green[i]) 


ax.legend() 
plt.Show() 

This will render a nice looking figure: 
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How it Works... 


We used the ColorBrewer website to find out colors in red, yellow, and green divergent 
colormap from Colorbrew. Then, we listed those colors in our code and applied them to 
our scatter plot. 

Colorbrew is a web tool, buiit by Cynthia Brewer, Mark Harrower, and The 
Pennsylvania State University as a tool to explore color maps. It is a very 
handy tool to pick up color maps of different ranges and see them applied 
on a map using slight variations so that you immediately sense what will they 
look like on a chart. This particular map is at http: //colorbrewer2 . or 
g/?type=diverging&scheine=RdYlGn&n=9. 

Sometimes, we will have to make our customization on matplotlib. rcParams, which is the 
first thing we want to do before we create figure or any of the axes. 

For example, matplotlib. rcParams [' axes . cycle_color ' ] is the configuration setting 
we want to change in order to set up default colormap for most of the matplotlib functions. 



There's more... 


Usingmatplotlib.pyplot.register_cmap, we can registera new colormap to matplotlib, 
so it can be found using the get_cmap function. We can use it in two different ways. Here are 
both signatu res: 

► register_cmap(name='swirly', cmap=swirly_cmap) 

► register_cmap(name='choppy', data=choppydata, lut=128) 

The first signature allows us to specify colormap as an instance of colors . Colormap and 
register it via the name argument. The name argument can be omitted in which case it will be 
inherited from the name attribute of the cmap instance provided. 

The latter one, we are passingthree arguments to the linear segmented colormap constructor, 
and registeringthat colormap afterwards. 

Using maplotlib .pyplot. get_cmap, we can get the colors . Colormap instance using 
name argument. 

Here's how to make your own map using matplotlib. colors. 
LinearSegmentedColormap: 

from pylab import * 

cdict = {'red': ((0.0, 0.0, 0.0), 

(0.5, 1.0, 0.7), 
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(1.0, 1.0, 1.0) ) , 

'green': ((0.0, 0.0, 0.0), 

(0.5, 1.0, 0.0), 

( 1 . 0 , 1 . 0 , 1 . 0 )) , 

'blue' ; ((0.0, 0.0, 0.0) , 

(0.5, 1.0, 0.0), 

(1.0, 0.5, 1.0)) } 

my_cmap = matplotlib.colors.LinearSegmentedColormap('my_ 

colormap',cdict,256) 

pcolor(rand(10,10),cmap=my_cmap) 

colorbar() 

This is the simplest part, while the hardest part is to actuaiiy come with a combination of 
coiors that are informative, do not take away from the data we want to visuaiize, and that 
are pieasant for the eyes of the viewer. 

For the base map iist (coiormaps iisted in the preceding tabie), we can use the pylab shortcut 
to set coiormap. For exampie, the foiiowing code wouid set coiormap of the image X to 
cmap='hot': 

imshow(X) 
hot () 


Using scatter plots and histograms 


Scatter piots are very often encountered around, as they are the most common piotto 
visuaiize the reiation between two variabies. If we want to take a quick iook at the data 
and see if there is any reiation between those (that is, correiation), we wouid draw a quick 
scatter piot. For a scatter piotto exist, we must have one variabie that can be systematicaiiy 
changed by, for exampie, experimenter, so we can inspect the possibiiities of infiuencing 
another variabie. 

Thafs why, in this recipe, you wiii iearn how to understand the scatter piots. 


Getting ready 


We want to see, for exampie, how two events are affected by each other or if they are affected 
at aii. This visuaiization is especiaiiy usefui on iarge sets of data, where we cannot make any 
conciusions by iooking at the data in the native form—when it is just numbers. 

Correiation between vaiues, if there is any, can be positive and negative. Positive correiation 
is when, for increasingX vaiues, the Y vaiues are increasingtoo. In negative correiation, for 
increasingX vaiues, Y vaiues are decreasing. In an ideal case, positive correiation is a iine 
starting from bottom-ieft corner of axes to top-right corner. Negative ideai correiation is a iine 
starting from top-ieft corner to the bottom-right corner of axes. 
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Ideal positive correiation between two data points is given the vaiue of 1 and ideai negative 
is given the vaiue of -1. Everything inside this intervai represents weaker correiation between 
two vaiues. Usuaiiy, everything inside -0.5 to 0.5 is not considered vaiuabie from a perspective 
of two variabies being in reai connection. 

Exampie of positive correiation wouid be the amount of money put in a charity jar being directiy 
positiveiy correiated to number of peopie seeingthe jar. Negative correiation is between the time 
required to reach piace B from piace A, depending on the distance between A and B iocations. 
The greater the distance, more time we need to compiete the travei. 

For exampie, what we have presented here is a positive correiation, butthis is not perfect, 
as different peopie might put different amounts of money per visit. But, in generai, we can 
assume that the more peopie see thatJar, more money wiii be ieft inside. 

Keep in mind, though, that even if the scatter piot dispiays correiation between two variabies, 
that correiation might not be a direct one. There might be a third variabie that infiuences 
both piotted variabies, so the correiation is Just a case that piotted vaiues are correiated 
with that third variabie. In the end, the correiation might be Just apparent and no reai reiation 
exists behind. 


How to do it... 


With the foiiowing code sampie, we wiii demonstrate how scatter piot can expiain the reiation 
between variabies. 

The data we use is obtained using the Googie Trends web portai, where one can downioad the 
CSV fiie containing normaiized vaiues of reiative search voiumes for given parameters. 

We wiii store our data in the ch07_search_data .py Python moduie, so we can import it in 
subsequent code recipes. 

Here's the content of it: 

# ch07 search data 


# daily search trend for keyword 'flowers' 


for a year 


DATA = [ 
1.04, 1.04, 
1.22, 1.26, 
1.34, 1.26, 
1.06, 1.06, 
1.00, 1.02, 
1.02, 1.02, 


1.16, 

1.22, 

1.46 

1.40, 

1.52, 

2.56 

1.04, 

1.02, 

1.06 


2.34, 

1.16, 

1.12 

1.36, 

1.30, 

1.20 

1.02, 

1.04, 

0.98 


1.24, 

1.30, 

1.44 

1.12, 

1.12, 

1.12 

0.98, 

0.98, 

1.00 
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1.00, 1.02, 
1.04, 0.74, 

0.96, 

0.94, 

0.94, 

0.94, 

0.96, 

0.86, 

0.92, 

0.98, 

1.08 

0.98, 1.02, 
1.22, 1.10, 

1.02, 

1.12, 

1.34, 

2.02, 

1.68, 

1.12, 

1.38, 

1.14, 

1.16 

1.14, 1.16, 
1.00, 1.00, 

1.28, 

1.44, 

2.58, 

1.30, 

1.20, 

1.16, 

1.06, 

1.06, 

1.08 

0.92, 1.00, 
1.06, 1.06, 

1.02, 

1.00, 

1.06, 

1.10, 

1.14, 

1.08, 

1.00, 

1.04, 

1.10 

1.06, 1.02, 
1.08, 0.80, 

1.04, 

0.96, 

0.96, 

0.96, 

0.92, 

0.84, 

0.88, 

0.90, 

1.00 

0.90, 0.98, 
1.30, 1.10, 

1.00, 

1.10, 

1.24, 

1.66, 

1.94, 

1.02, 

1.06, 

1.08, 

1.10 

1.12, 1.20, 
0.98, 0.94, 

1.16, 

1.26, 

1.42, 

2.18, 

1.26, 

1.06, 

1.00, 

1.04, 

1.00 

0.88, 0.98, 
0.96, 0.96, 

0.96, 

0.92, 

0.94, 

0.96, 

0.96, 

0.94, 

0.90, 

0.92, 

0.96 

0.98, 0.90, 
1.00, 0.68, 

0.90, 

0.88, 

0.88, 

0.88, 

0.90, 

0.78, 

0.84, 

0.86, 

0.92 

0.82, 0.90, 
0.98, 1.00, 

0.88, 

0.98, 

1.08, 

1.36, 

2.04, 

0.98, 

0.96, 

1.02, 

1.20 

1.08, 0.98, 
0.86, 0.88, 

1.02, 

1.14, 

1.28, 

2.04, 

1.16, 

1.04, 

0.96, 

0.98, 

0.92 

0.82, 0.92, 
0.86, 0.84, 

0.90, 

0.86, 

0.84, 

0.86, 

0.90, 

0.84, 

0.82, 

0.82, 

0.86 

0.84, 0.82, 
0.90, 0.60, 

0.80, 

0.78, 

0.78, 

0.76, 

0.74, 

0.68, 

0.74, 

0.80, 

0.80 

0.72, 0.80, 
0.94, 0.90, 

0.82, 

0.86, 

0.94, 

1.24, 

1.92, 

0.92, 

1.12, 

0.90, 

0.90 

0.90, 0.94, 
0.82, 0.84, 

0.98, 

1.08, 

1.24, 

2.04, 

1.04, 

0.94, 

0.86, 

0.86, 

0.86 

0.76, 0.80, 
0.78, 0.78, 

0.80, 

0.80, 

0.78, 

0.80, 

0.82, 

0.76, 

0.76, 

0.76, 

0.76 

0.76, 0.76, 
0.74, 0.64, 

0.72, 

0.74, 

0.70, 

0.68, 

0.72, 

0.70, 

0.64, 

0.70, 

0.72 

0.62, 0.74, 
1.16, 1.02, 

0.80, 

0.82, 

0.88, 

1.02, 

1.66, 

0.94, 

0.94, 

0.96, 

1.00 

1.04, 1.06, 
1.02, 0.94, 

1.02, 

1.10, 

1.22, 

1.94, 

1.18, 

1.12, 

1.06, 

1.06, 

1.04 

0.94, 0.98, 
0.90, 0.84, 

0.96, 

0.96, 

0.98, 

1.00, 

0.96, 

0.92, 

0.90, 

0.86, 

0.82 

0.84, 0.82, 
0.80, 0.76, 

0.80, 

0.80, 

0.76, 

0.80, 

0.82, 

0.80, 

0.72, 

0.72, 

0.76 

0.70, 0.74, 
0.90, 0.92, 

0.82, 

0.84, 

0.88, 

0.98, 

1.44, 

0.96, 

0.88, 

0.92, 

1.08 

0.96, 0.94, 
0.86, 0.82, 

1.04, 

1.08, 

1.14, 

1.66, 

1.08, 

0.96, 

0.90, 

0.86, 

0.84 
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0.84, 0.82, 0.84, 0.84, 0.84, 0.84, 0.82, 0.86, 0.82, 0.82, 0.86, 

0.90, 0.84, 

0.82, 0.78, 0.80, 0.78, 0.74, 0.78, 0.76, 0.76, 0.70, 0.72, 0.76, 

0.72, 0.70, 

0.64] 

We need to perform the following steps: 

1. Use a cleaned dataset of Googie Trend search volume for 1 year for keyword 
' f lowers '. We will import this dataset into variable d. 

2. Use a random normal distributiori of the same iength (365 data points) as our 
Googie Trend dataset. This wiii be dataset di. 

3. Create a figure containingfour subpiots. 

4. In the first subpiot, piot scatter-piot of d and di. 

5. In the second subpiot, piot scatter-piot of di with di. 

6. In the third subpiot, render scatter-piot of of di with inverted di. 

7. In the fourth subpiot, render scatter-piot of di with similar dataset constructed 
of (dl+d). 

This code will illustrate the relation as we explained them earlier in this recipe: 

import matplotlib.pyplot as plt 
import numpy as np 

# import the data 

from ch07_search_data import DATA 
d = DATA 

# Now let's generate random data for the same period 
dl = np.random.random(365) 

assert len(d) == len(dl) 

fig = plt.figureO 

axi = fig.add_subplot(221) 
axi.scatter(d, dl, alpha=0.5) 
axi.set_title('No correlation') 
axi.grid(True) 

ax2 = fig.add_subplot(222) 
ax2.scatter(dl, dl, alpha=0.5) 
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ax2.set_title('Ideal positive correlation') 

ax2.grid(True) 

ax3 = fig.add_subplot(223) 

ax3.scatter(dl, dl*-l, alpha=0.5) 

ax3.set_title{'Ideal negative correlation') 

ax3.grid(True) 

ax4 = fig.add_subplot(224) 

ax4.scatter(dl, dl+d, alpha=0.5) 

ax4.set_title('Non ideal positive correlation') 

ax4.grid(True) 

plt.tight_layout() 

plt.Show() 

This is the figure we shouid get when the preceding code is executed: 
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How it Works... 


The preceding sample we see, clearly displays if there is any correlation between different 
datasets. Whiie the second (top right) subpiot shows ideai or perfect, positive correiation of 
dataset di with di itseif (obviousiy). We can see thatthe fourth subpiot (bottom right) hints 
that there is a positive correiation, aithough not ideai. We constructed this dataset from di 
and d (random) to simuiate two simiiar signais (events), where the second one (d + di) has 
certain randomness (or noise) in it, but stiii can be comparabie with the originai (d) signai. 


There's more... 


We can aiso add histograms to scatter piots in such a way thatthey can teii us more about 
the data piotted. We can add horizontai and verticai histograms to show frequencies of data 
points on the X and Y axes. Using this, we can, at the same time, see the summary of the 
whoie dataset (histogram) and individuai data points (scatter-piot). 

Here is the exampie of the code to generate a scatter-histogram combination, using the 
same two datasets we introduced in this recipe. The meat of the code is the scatterhist () 
function that is given here to be reusabie to different datasets, trying to set some of the 
variabies based on the dataset provided (number of bins in histogram, iimits for axes and 
simiiar). 

We start with the usuai imports: 

import numpy as np 

import matplotlib.pyplot as plt 

from mpl_toolkits.axes_gridl import make_axes_locatable 

This is the definition of our function to generate scatter histograms given x,y dataset and, 
optionaiiy, a f igsize parameter: 

def scatterhist(x, y, figsize= (8,8)) : 

II It 11 

Create simple scatter & histograms of data x, y inside given plot 
(Sparam figsize: Figure size to create figure 

Otype figsize: Tuple of two floats representing size in inches 

(Sparam x: X axis data set 
(Stype x: np. array 

(Sparam y: Y axis data set 
(Stype y: np.array 
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scatter_axes = plt.subplots(figsize=figsize) 

# the scatter plot: 

scatter_axes.scatter(x, y, alpha=0.5) 
scatter_axes.set_aspect(1.) 

divider = make_axes_locatable(scatter_axes) 

axes_hist_x = divider.append_axes(position="top", sharex=scatter 
axes, 

size=l, pad=0.1) 

axes_hist_y = divider.append_axes(position="right", 
sharey=scatter_axes, 

size=l, pad=0.1) 


# compute bins accordingly 
binwidth = 0.25 

# global max value in both data sets 

xymax = np.max([np.max(np.fabs(x)), np.max(np.fabs(y))]) 

# number of bins 

bincap = int(xymax / binwidth) * binwidth 

bins = np.arange(-bincap, bincap, binwidth) 

nx, binsx, _ = axes_hist_x.hist(x, bins=bins, 
histtype='stepfilled', 

orientation='vertical') 

ny, binsy, _ = axes_hist_y.hist(y, bins=bins, 
histtype='stepfilled', 

orientation='horizontal') 


tickstep = 50 

ticksmax = np.max([np.max(nx), np.max(ny)]) 

xyticks = np.arange(0, ticksmax + tickstep, tickstep) 

# hide X and y ticklabels on histograms 
for tl in axes_hist_x.get_xticklabels () : 

tl.set_visible(False) 
axes_hist_x.set_yticks(xyticks) 

for tl in axes_hist_y.get_yticklabels () : 

tl.set_visible(False) 
axes_hist_y.set_xticks(xyticks) 

plt.Show() 
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Now, we proceed with loadingof the data and function call to generate and renderthe 
desired chart: 

if _name_ == '_main_': # import the data 

from ch07_search_data import DATA as d 

# Now let's generate random data for the same period 
dl = np.random.random(365) 

assert len(d) == len(dl) 

# try with the random data 

# d = np.random.randn(1000) 

# dl = np.random.randn(1000) 

scatterhist(d, dl) 

This shouid generate the following figure: 
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Plotting the cross correlation between 
two variables 


If we have two different datasets from two different observatione, we want to know if those 
two event sets are correiated. We want to cross correiate them and see if they match in 
any way. We are iooking for a pattern of a smaiier data sampie in a iarger data sampie. 

The pattern does not have to be an obvious or simpie pattern. 


Getting ready 


We can use the matplotlib' s function from pyplot iab— matplotlib .pyplot. xcorr. 
These functione can piot correiation between two datasets in such a way that we can see if 
there is any significant pattern between the piotted vaiues. It is assumed that x and y are of 
the same iength. 

If we pass the normed argument as True, we can normalize by cross correlation at 0-th lag 
(that is, when there is no time delay or time lag). 

Behind the scenes, correlation is done using NumPy's numpy. correiate function. 

Usingthe usevlines argument (setting itto True), we can instruet matplotlib to use 
vlines () instead of plot () to draw lines of the correlation piot. The main difference is, 
if we are using piot (), we can style the lines using Standard Line2D properties passed 
in the **kwargs argument to the matplotlib. pyplot. xcorr function. 


How to do it... 


In this following example, we need to: 

1. Importthe matplotlib .pyplot module. 

2. Import the numpy package. 

3. Use cleaned dataset of Googie search volume trend for a year for the 
keyword 'flowers'. 

4. Piot the datasets (real one and artificiai one) and cross correlation diagram. 

5. Tighten the layout in order to have better overview of labeis and ticks. 

6. Add appropriate labeis and grids for easier understanding of the piot. 
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This is the code that will perform the previously mentioned steps: 

import matplotlib.pyplot as plt 
import numpy as np 

# import the data 

from ch07_search_data import DATA as d 

total = sum(d) 
av = total / len(d) 
z = [i - av for i in d] 

# Now let's generate random data for the same period 
dl = np.random.random(365) 

assert len(d) == len(dl) 

totali = sum(dl) 

avi = totali / len(dl) 

zl = [i - avi for i in dl] 

fig = plt.figureO 

# Search trend volume 

axi = fig.add_subplot(311) 
axi.plot(d) 

axi.set_xlabel('Google Trends data for "flowers"') 

# Random: "search trend volume" 
ax2 = fig.add_subplot(312) 

ax2.plot(dl) 

ax2.set_xlabel('Random data') 

# Is there a pattern in search trend for this keyword? 
ax3 = fig.add_subplot(313) 

ax3.set_xlabel('Cross correlation of random data') 
ax3.xcorr(z, zl, usevlines=True, maxlags=None, normed=True, lw=2) 
ax3.grid(True) 
plt.ylim(-l, 1) 

plt.tight_layout() 

plt.Show() 
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This code will render the following figure: 




How it Works... 


We used real data set with a recognizable pattern in it (two peaks repeating in simiiar manner 
across the dataset—see the preceding figure). The other dataset is justa random normai 
distributed data of the same iength as reai accrued data from pubiic Service Googie Trends. 

We piotted both datasets over the top haif of the figure to visuaiize the data. 

Using matpiotiib's xcorr, which in turn uses NumPy's correlate () function, we computed 
cross correiation and piotted it on the bottom haif. 

Cross-correiation computation in NumPy returns correiation coefficients array that represent 
degree of simiiarity of two datasets (or signais, as usuaiiy referred to if used in signai 
Processing fieid). 

The cross-correiation diagram— correlogram— teiis us that these two signais are not correiated, 
which is represented by the height of correiation vaiues (verticai iines at certain time iags). 

We see that there is no one verticai iine (correiation coefficient at time iag n) that is the 
preceding 0.5 vaiue. 


If, for exampie, two datasets wouid have correiation at time iag 100 (for exampie, 100 
seconds shift between same object observed by two different sensors), we wouid see 
verticai iine (representing correiation coefficient) at x = 100 in this preceding figure. 


j223| — 



















































Using the Right Plots to Understand Data 


Importance of autocorrelation 


Autocorrelation represents the degree of similarity between a given time series and a iagged 
(that is, deiayed in time) version of itseif over successive time intervais. It occurs in time 
series studies when the errors associated with a given time period carry over into future time 
periods. For exampie, if we are predictingthe growth of stock dividends, an overestimate in 
1 year is iikeiy to iead to overestimates in the succeeding years. 

The time series anaiysis data arise in iots of different scientific appiications and in iots 
of financiai processes. Some of the exampies inciude: generated reports of financiai 
performance, prices over time, computing voiatiiity, and others. 

If we are analyzing unknown data, autocorrelation can help us detect if the data is random or 
not. For that, we can use a correlogram. It can help provide answers to questions such as: is 
the data random, is this time series data a white noise, is it sinusoidal, is it autoregressive, 
what is the modei of this time series data? 


Getting ready 


We will use matplotiib to compare two sets of data. One is Googie day trend of search volume 
for a certain keyword for 1 year (365 days). The other set is 365 random measurements 
(generated with random data) with normal distribution. 

We will autocorrelate both datasets and compare how the correlograms visualize patterns 
in data. 


How to do it... 


In this section, we will perform the following steps: 

1. Importthe matplotiib .pyplot module 

2. Import the numpy package 

3. Use a cleaned dataset of Googie search volume for a year 

4. Plot the data set and plot its autocorrelation diagram 

5. Generate the same-length random dataset using NumPy 

6. Plot the random dataset on the same figure and plot its autocorrelation diagram 

7. Add appropriate labeis and grids for easier understanding of the plot 
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This is the code: 

import matplotlib.pyplot as plt 
import numpy as np 

# import the data 

from ch07_search_data import DATA as d 

total = sum(d) 
av = total / len(d) 
z = [i - av for i in d] 

fig = plt.figureO 

# plt.title('Comparing autocorrelations') 

# Search trend volume 

axi = fig.add_subplot(221) 
axi.plot(d) 

axi.set_xlabel('Google Trends data for "flowers"') 

# Is there a pattern in search trend for this keyword? 
ax2 = fig.add_subplot(222) 

ax2.acorr(z, usevlines=True, maxlags=None, normed=True, lw=2) 
ax2.grid(True) 

ax2.set_xlabel('Autocorrelation') 

# Now let's generate random data for the same period 
dl = np.random.random(365) 

assert len(d) == len(dl) 

total = sum(dl) 
av = total / len(dl) 
z = [i - av for i in dl] 

# Random: "search trend volume" 
ax3 = fig.add_subplot(223) 

ax3.plot(dl) 

ax3.set_xlabel('Random data') 

# Is there a pattern in search trend for this keyword? 
ax4 = fig.add_subplot(224) 

ax4.set_xlabel('Autocorrelation of random data') 

ax4.acorr( z, usevlines=True, maxlags=None, normed=True, lw=2) 

ax4.grid(True) 

plt.Show() 
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This code will render following figure: 



Random data Autocorrelation of random data 


How it Works... 


Looking at the left-hand side plots it is easy to spot patterns in search volume data, where 
bottom left plot is normally distributed random data—where patterns are not obvious, but stili 
might exist. 

Computing and plotting autocorrelation over the random data, we see that there is a high 
correlation at 0—which is expected, data is correlated with itself in no time lag. But going 
before or after no time lag, the signal is almost 0, so we can safely conclude that there is 
no correlation between the signal in original time and any time lags examined. 

Looking at the real data—Googie search volume trend—we can see the same behavior at 
0 time lag, stili something we can expect for any autocorrelated signal. But, we have strong 
signais at around 30, 60, and 110 days after 0 time lag. This indicates that there is a pattern 
with this particular search term and a way people search for it on the Googie search engine. 

Explaining why is this is a very different story, and we will leave this exercise to the reader. 
Remember that correlation and causation are two very different things. 
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There's more... 


Autocorrelation is used very often when we wantto identify modei for unknown data, and try 
to fit data into a modei. How data correiates to itseif is sometimes a first step to identifying 
an appropriate modei for a dataset we are presented with. This requires more than Python; 
it requires knowiedge of mathematicai modeiing. Various statisticai tests (Ljung-Box test, 
Box-Pierce test, and so on) wiii heip us answer these questions. 
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More on matplotlib 

Gems 


In this chapter, we will cover: 

► Drawingbarbs 

► Makinga box-and-whisker plot 

► Making Gantt charts 

► Making error bars 

► Making use of text and font properties 

► Renderingtext with LaTeX 

► Understandingthe difference between pypiot and 00 API 


Introduction 


In this chapter, we wiii expiore some iess frequentiy used features of the matpiotiib package. 
Some of these exampies stretch the matpiotiib originai target, butthey show what can be 
done with a iittie creativity, and prove that matpioiib is fuii featured and genericaiiy oriented. 


Drawing barbs 


A barb is a representation of the speed and direction of wind, and is mainiy depioyed by 
meteoroiogy scientists. In theory, they can be used to visualize any type of two-dimensional 
vector quantities. They are similar to arrows (quivers), but the difference is that arrows 
represent vector magnitude by the iength of the arrow, whiie barbs give more information 
about the vectods magnitude by empioying iines or triangies as increments of magnitude. 

- 1229| — 








More on matplotlib Gems 


We will explain what barbs are, how to read them, and how to visualize them using Python and 
matplotlib. Here's a typical set of barbs: 


o 
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In the preceding diagram, the triangie, also known as flag, represents the largest increment. 

A full line or barb represents a smaller increment; a half line is the smallest increment. 

The increments are in the order of 5 , 10 , and 65 for a half-line, line, and triangie, respectively. 
The values here represent, for meteorologists at least, wind speed in nautical miles per hour 
(knots). 

We ordered the barbs from leftto rightto represent the following magnitudes: 0 , 5 , 10 , 15 , 

30 , 40 , 50 , 60 , and 100 knots. The direction here is the same for each barb and is from north 
to South, because the east-west speed component is 0 for each barb. 


Getting ready 


A barb can be created using a matplotlib function from matplotlib.pyplot .barbs. 

The barbs function accepts various arguments, but we can also oniy specify Xand Y 
coordinates, representing locations of observed data points. The second pair of arguments—u, 
V— represents the magnitude of the vector in north-south and east-west directions in knots. 

Other arguments that can be usefui are pivots, sizes, and various coloringarguments. 

A pivot argument (pivot) represents the part of the arrow represented on the grid point. We 
get a pivot argument when the arrow rotates around this point. The arrow can rotate around 
the tip or middie, which are valid values for the pivot argument. 

Because barbs consist of several parts, we can set up the coloring of any of those parts. So, 
we have a few color-related arguments that we can set up: 

► barbcolor: This defines the color of all the parts for a barb, except for flags 

► f lagcolor This defines the color of any flag on the barb 

► f acecolor: This argument is used if none of the preceding color arguments are 
specified (or the default value is read from rcParams) 

If any of the preceding color-related arguments are specified, the argument f acecolor is 
overridden. The f acecolor argument is the one used in coloring polygons. 
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The size argument (sizes) specifies the ration of a feature to the length of the barb. This is a 
collectiori of coefficients that can be specified using any or all of the following keys: 

► spacing: This defines the space among features of the flag/barb 

► height: This defines the distance from the shaft to the top of a flag or barb 

► width: This defines the width of a flag 

► emptybarb: This defines the circle radius used for low magnitudes 


How to do it... 


Let's demonstrate how to use a barb function by performing the following steps: 

1. Generate a grid of coordinates to simulate observations. 

2. Simulate observational values for wind speed. 

3. Plot barb diagrams. 

4. Plot quivers to demonstrate different appearances. 

The following code will generate the figure: 

import matplotlib.pyplot as plt 
import numpy as np 

X = np.linspace(-20, 20, 8) 
y = np.linspace( 0, 20, 8) 

# make2D coordinates 
X, Y = np.meshgrid(x, y) 

U, V = X+25, Y-35 


# plot the barbs 
plt.subplot(1,2,1) 

plt.barbs(X, Y, U, V, flagcolor='green', alpha=0.75) 
plt.grid(True, color='gray') 

# compare that with quiver / arrows 
plt.subplot(1,2,2) 

plt.quiver(X, Y, U, V, facecolor='red', alpha=0.75) 


# misc settings 

plt.grid(True, color='grey') 

plt.Show() 
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The preceding code renders two subpiots as shown in the following figure: 




How it Works... 


To illustrate how the same data can bring different Information to light, we used barbs and 
quiver plots from matplotiib to visualize simulated observed wind data. 

First, we used NumPy to generate samples of variations for x and y arrays. Then we used 
NumPy's meshgrid () function to create a 2D grid of coordinates where our observed data 
is sampled at certain coordinates. Finally, u and v are wind speed values in NS (north-south) 
and EW (east-west) directions in knots (nautical miles per hours). For the purpose of the 
recipe, we adjusted some values from the aiready available X and Y matrices. 

We then divided the figure into two subpiots, plotting barbs in the leftmost plot and arrow- 
patches in the rightmost plot. We adjusted the color and transparency of both the subpiots 
slightiy, as well as turned the grid on both the subpiots. 


There's more... 


This is ali fine on the northern hemisphere where the wind rotates in a counter-clockwise 
direction and the feathers (triangles, full lines, and half lines of the barb) point in the direction 
of lower pressure. On the Southern hemisphere, this is inverted so our wind barb graph wouid 
not represent the data we are visualizing correctiy. 

We have to invert this direction of feathers. Luckily, the barbs function has the argument 
f lip_barb. This argument can be of one singie Boolean value (True or False) or a sequence 
of Boolean values such as the shape of other data arrays, when each item in the sequence 
specifies a flip decision for each barb. 
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Making a box-and-whisker plot 


Do you wantto visualize a series of data measurement (or observations) to show several 
properties of the data series (such as the median vaiue, the spread of the data, and the 
distribution of the data) in one piot? And wouid you want to do that in a way where you can 
visuaiiy compare severai simiiar data series? How wouid you visuaiize them? Weicome to 
the box-and-whisker plot! Probabiy the best piot type for comparing distributione, if you 
are taikingto peopie used to information density. 

The box-and-whisker piot usage exampies range from comparing test scores between schoois 
to comparing process parameters before and after changes (optimization). 


Getting ready 


What are the eiements of box-and-whisker piots? As we see in the foiiowing diagram, we 
have severai important eiements that carry information in the box-and-whisker piot. The first 
component is the box that carries information about the interquartiie range going from iower 
to upper quartiie vaiues. The median vaiue of the data is represented by a iine across the box, 



The whiskers extend from the box on both sides going from the first quartiie (25 percentiie) 
to the iast quartiie (75 percentiie) of the data. In other words, the whiskers extend 1.5 times 
from the base of the inter-quartiie range. In the case of a normal distribution, whiskers wiii 
cover 99.3% of the totai data range. 

If there are vaiues outside the whiskers range, they will be displayed as fliers. Otherwise, the 
whiskers will cover the totai range of the data. 

Optionally, the box can also carry information about confidence intervals around the median. 
This is represented by a notch in the box. This information can be used to indicate whether 
the data in the two series is of the simiiar distribution. However, this is not rigorous and is 
just an indication that can be visuaiiy inspected. 
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How to do it... 


In the following recipe, you will learn how to create a box-and-whisker plot using matplotlib. 
We will perform the following steps: 

1. Sample some comparative process data, where a singie integer number represents 
the occurrence of an error during the observed period of the running process. 

2. Read data from the processes dictionary into data. 

3. Read labeis from the processes dictionary into labels. 

4. Renderthe box-and-whisker plot using matplotlib. pyplot .boxplot. 

5. Remove some chart junk from the figure. 

6. Add axes labeis. 

7. Show the figure. 

The following code implements these steps: 

import matplotlib.pyplot as plt 
# define data 
PROCESSES = { 


"A" ; 

[12, 

15, 23, 

24, 

30, 

31, 

33 , 

36, 

50, 

73 

"B" ; 

[6, 

22, 26, 

33, 

35, 

47, 

54, 

55, 

62, 

63] 

"C" ; 

[2, 

3, 6, 8, 

13 , 

14, 

19, 

23 , 

60, 

69] 

/ 

"D" ; 

[1, 

22, 36, 

37, 

45, 

47, 

48, 

51, 

52, 

69] 


} 


DATA = PROCESSES.values0 
LABELS = PROCESSES.keys() 

plt.boxplot(DATA, notch=False, widths=0.3) 

# set ticklabel to process name 

plt.gea().xaxis.set_ticklabels(LABELS) 

# some clean up(removing chartjunk) 

# turn the spine off 

for spine in plt.gea().spines.values(): 
spine.set_visible(False) 

# turn all ticks for x-axis off 

plt.gea().xaxis.set_ticks_position('none') 

# leave left ticks for y-axis on 

plt.gea().yaxis.set_ticks_position('left') 




# set axes labeis 

plt.ylabel("Errors observed over defined period.") 
plt.xlabel("Process observed over defined period.") 
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plt.Show() 

The preceding code generates the followingfigure: 



How it Works... 


The box-and-whisker plet is rendered by first computing quartiles for the given data in data. 

These quartiie vaiues are used to compute iines to draw boxes and whiskers. 

We adjusted the piot removing aii the unnecessary iines (referringto superfiuous iines such as 
chartjunk, as mentioned in the famous book, The Visual Display of Quantitative Information, 
by Edward R. Tufte). Those iines do not carry information and just put more pressure on the 
mentai modeis in a viewehs brain to decode aii the iines before discovering reai vaiuabie 
information. 


imi- 
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Making Gantt charts 


One form of very widely used visualization of time-based data is a Gantt chart. Named after 
the mechanical engineer Henry Gantt who invented it in 1910s, it is aimost exciusiveiy used to 
visuaiize work breakdown structures in project management. This chart is ioved by managers 
for its descriptive vaiue and not so ioved by empioyees, especiaiiy when the project deadiine 
is near. 

This kind of chart is very straightforward, aimost every one can understand and read it, even if 
it is overioaded with additionai (reiated and unreiated) information. 

A basic Gantt chart has a time series on the X axis and a set of iabeis that represent tasks or 
subtasks on the Y axis. Task duration is usuaiiy visuaiized either as a iine or as a bar chart, 
extending from the start to end time of a given task. 

If subtasks are present, one or many subtasks have a parent task, in which the case totai 
time of a task is aggregated from subtasks in such a way that overiapping and gap time is 
accounted for. 

So, in this recipe, we wiii be coveringthe creation of the Gantt chart using Python. 


Getting ready 


There are many fuii-fiedged Software appiications and Services that aiiow you to make very 
fiexibie and compiicated Gantt charts. We wiii try to demonstrate how you couid do it in pure 
Python, not reiying on externai appiications, yet achieving neat iooking and informative 
Gantt charts. 

The Gantt chart shown in the exampie does not support nested tasks, but it is sufficient for 
simpie work breakdown structures. 


How to do it... 


The foiiowing code exampie wiii aiiow us to demonstrate how Python can be used together 
with matpiotiib to render the Gantt chart. We wiii perform the foiiowing steps: 

1. Load TEST_DATA that contains a set of tasks and instantiate the Gantt ciass with 
TEST_DATA. 

2. Each task contains a iabei and the start and end time. 

3. Process aii tasks by piotting horizontai bars on the axes. 

4. Format the x and y axes for the data we are rendering. 

5. Tighten the iayout. 

6. Show the Gantt chart. 
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The following is a sample code: 

from datetime import datetime 
import sys 

import numpy as np 

import matplotlib.pyplot as plt 

import matplotlib.font_manager as font_manager 

import matplotlib.dates as mdates 

import logging 


class Gantt(object): 

1 t 1 


Simple Gantt renderer. 

Uses *matplotlib* rendering capabilities. 

1 t 1 


# Red Yellow Green diverging colormap 

# from http;//colorbrewer2.org/ 

RdYlGr = ['#d73027', '#f46d43', '#fdae61', 

'#fee08b', '#ffffbf', '#d9ef8b', 

'#a6d96a', '#66bd63', '#la9850'] 

POS_START = 1.0 
POS_STEP = 0.5 

def _init_(self, tasks): 

self._fig = plt.figureO 

self._ax = self._fig.add_axes([0.1, 0.1, .75, .5]) 

self.tasks = tasks[;:-l] 

def _format_date(self, date_string): 

1 t 1 

Formats string representation of *date_string* into 
*matplotlib.dates* 
instance. 

1 t 1 

try; 

date = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S') 
exceptValueError as err; 

logging.error("String '{o}' can not be converted to datetime object: 
{ 1 }" 
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.format(date_string, err)) 

sys.exit(-1) 

mpl_date = mdates.date2num(date) 
returnmpl_date 

def _plot_bars(self) : 

1 t 1 

Processes each task and adds *barh* to the current *self._ax* 
(*axes*). 

1 I 1 

i = 0 

for task in self.tasks: 

start = self._format_date(task ['start']) 

end = self._format_date(task['end']) 

bottom = (i * Gantt.POS_STEP) + Gantt.POS_START 

width = end - start 

self._ax.barh(bottom, width, left=start, height=0.3, 
align='center', label=task['label'], 
color = Gantt.RdYlGr[i] ) 
i += 1 

def _configure_yaxis(self) : 

'''y axis''' 

task_labels = [t['label'] for t in self.tasks] 
pos = self._positions(len(task_labels)) 
ylocs = self._ax.set_yticks(pos) 
ylabels = self._ax.set_yticklabels(task_labels) 
plt.setp(ylabels, size='medium') 

def _configure_xaxis(self) : 

''''X axis''' 

# make x axis date axis 
self._ax.xaxis_date() 

# format date to ticks on every 7 days 

rule = mdates.rrulewrapper(mdates.DAILY, interval=7) 
loc = mdates.RRuleLocator(rule) 
formatter = mdates.DateFormatter("%d %b") 

self._ax.xaxis.set_maj or_locator(loc) 

self._ax.xaxis.set_maj or_formatter(formatter) 
xlabels = self._ax.get_xticklabels() 
plt.setp(xlabels, rotation=30, fontsize=9) 
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def _configure_figure(self); 

self._configure_xaxis() 
self._configure_yaxis() 

self._ax.grid(True, color='gray') 

self._set_legend() 

self._fig.autofmt_xdate() 

def _set_legend(self); 

1 t 1 

Tweak font to be small and place *legend* 
in the upper right corner of the figure 

1 t 1 

font = font_manager.FontProperties(size='small') 

self._ax.legend(loc='upper right', prop=font) 

def _positions(self, count); 

1 t 1 

For given *count* number of positions, get array for the 
positions. 

1 t 1 

end = count * Gantt.POS_STEP + Gantt.POS_START 

pos = np.arange(Gantt.POS_START, end, Gantt.POS_STEP) 

return pos 

The main function that drives the Gantt chart generation is defined in the foiiowing code. 
In this function, we ioad the data into an instance, piot bars accordingiy, set up the date 
formatter for the time axis (x axis), and set vaiues for the y axis (the projecfs tasks). 

def Show(self): 

self._plot_bars() 
self._configure_figure() 
plt.Show() 


if name 

== ' main 

1 . 




TEST_DATA = 

( 





{ 'label'; 

' Research', 


' start' : '2013-10-01 

12:00:00', 

' end 

'2013-10-02 

18:00:00'}, 

# 

@IgnorePep8 



{ 'label'; 

' Compilation' 

, 

'start':'2013-10-02 

09:00:00', 

' end 

'2013-10-02 

12:00:00'}, 

# 

@IgnorePep8 



{ 'label'; 

' Meeting #1', 


'start':'2013-10-03 

12:00:00', 

' end 

'2013-10-03 

18:00:00'}, 

# 

@IgnorePep8 



{ 'label'; 

' Design', 


'start':'2013-10-04 

09:00:00', 

' end 

'2013-10-10 

13:00:00'}, 

# 

@IgnorePep8 
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{ 'label'; 'Meeting #2', 'start2013-10-11 09:00:00', 'end' : 

'2013-10-11 13:00:00'}, # @IgnorePep8 

{ 'label': 'Implementation', 'start':'2013-10-12 09:00:00', 'end': 

'2013-10-22 13:00:00'}, # @IgnorePep8 

{ 'label': 'Demo', 'start':'2013-10-23 09:00:00', 'end': 

'2013-10-23 13:00:00'}, # @IgnorePep8 

) 

gantt = Gantt(TEST_DATA) 
gantt.show() 

This code will render a simple, neat looking Gantt chart like the following one: 


Research 

Compitation 

□ 

D 

Implementation 
Meeting #2 

1 1 Design 

1 1 Meeting #1 

1 1 Compilation 

1-1 Research 

Meeting #1 

D 


Design 

1 n 


Meeting #2 

D 


Implementation 


Demo 

1 


How it Works... 


We can start reading the preceding code from the bottom after the condition that checks if we 
are in "_ main_ 

After we instantiate the Gantt class giving it test_data, we set up the necessary fields of our 
instance. We save task_data in the self. tasks field, and we create our figure and axes to 
hold the charts we create in future. 

Then, we call show () on the instance that walks us through the steps required to render the 
Gantt chart: 

def show(self): 

self._plot_bars() 
self._configure_figure() 
plt.show() 

Plotting bars requires iteration where we apply the data about the name and duration of each 
task to the matplotlib.pyplot .barh function, adding itto the axes at self ._ax. We 
place each task in a separate channei by giving it a different (incremental) bottom argument 
value. 
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Also, to make it easy to map tasks to their names, we cycle over the divergent color maps that 
we generated using the colorbrewer2 . org tool. 

The next step is to configure the figure, which means that we set up the format date on 
the X axis and tickers' positions and labeis on the y axis to match the tasks plotted by 
matplotlib.pyplot.barh. 

Finally, we add a grid and a legend. 

At the end, we call plt. show () to show the figure. 


Making error bars 


Error bars are usefui to display the dispersion of data on a plot. They are relatively simple as 
a form of visualization; however, they are also a bit problematic because what is shown as an 
error varies across different Sciences and publications. This does not lessen the usefulness of 
error bars, itjust imposes the need to always be carefui and explicitiy state the nature of the 
error visualized as an error bar. 


Getting ready 


To be able to plot an error bar in the raw observed data, we need to compute the mean and 
the error we want to display. 

The error we compute represents the 95% confidence interval that the mean we get from 
our observation is stable, which means our observations are good estimates of the whole 
population. 

Matplotlib supports these type of plots via matplotlib.pyplot. errorbar function. 

It offers several kinds of error bars. They can be vertical (yerr) or horizontal (xerr) and 
symmetrical or asymmetrical. 


How to do it... 


In the following code we will: 

1. Use some sample data that consists of four sets of observations. 

2. For each set of observations, compute the mean value. 

3. For each set of observations, compute the 95% confidence interval. 

4. Render bars with vertical symmetrical error bars. 


{2^ 
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Here is the cede for this: 

import matplotlib.pyplot as plt 
import numpy as np 
import scipy.stats as sc 


TEST_DATA = np.array([[1,2,3,2,1,2,3,4,2,3,2,1,2,3,4,4,3,2,3,2,3,2,1], 

[5.6.5.4.5.6.7.7.6.7.7.2.8.7.6.5.5.6.7.7.7.6.5] , 
[9,8,7,8,8,7,4,6,6,5,4,3,2,2,2,3,3,4,5,5,5,6,1], 

[3.2.3.2.2.2.2.3.3.3.3.4.4.4.4.5.6.6.7.8.9.8.5] , 

] ) 


# find mean for each of our observations 

y = np.mean(TEST_DATA, axis=l, dtype=np.float64) 

# and the 95% confidence interval 

ci95 = np.abs(y - 1.96 * sc.sem(TEST_DATA, axis=l)) 

# each set is one try 

tries = np.arange(0, len(y), 1.0) 

# tweak grid and setup labeis, limits 
plt.grid(True, alpha=0.5) 

plt.gea().set_xlabel('Observation #') 
plt.gea().set_ylabel('Mean (+- 95% CI)') 

plt.title("Observations with corresponding 95% CI as error bar.") 
plt.bar(tries, y, align='center', alpha=0.2) 
plt.errorbar(tries, y, yerr=ci95, fmt=None) 

plt.Show() 

The preceding code will render a plot with error bars that dispiay 95% confidence intervais 
as whiskers extending aiong the y axis. Remember, the wider the whiskers, the iesser are the 
probabiiity that the observed mean is true. The foiiowing graph is the output for the 
preceding code: 
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in 


Observations with corresponding 95% Cl as error bar. 

12 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 
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Observation # 


How it Works... 


In order to avoid iterating over each set of observations, we use NumPy's vectorized methods 
to compute means and Standard errors, which we use for plotting and computing error values. 

Using NumPy's vectorized impiementations, which are written in C ianguage (and caiied from 
Python), aiiows us to speed up computations by severai magnitudes. 

This is not very important for a few data points but, for miiiions of data points, it can either 
make or break our efforts to create responsive appiications. 

Aiso, you may note that we expiicitiy specified dtype=np. f loat 64 in the np .mean function 
caii. Accordingto the officiai NumPy documentation reference (http://docs.scipy.org/ 
doc/numpy/reference/generated/numpy.mean.html),np.mean can be inaccurate if 
used in singie precision; it's better to compute it with np. f loat32, or if performance is not 
an issue, use np. float 64. 
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There's more... 


There is an ongoing issue with what to show on error bars. Some advise on using SD, 2SD, SE, 
or 95%CI. We must understand whatthe difference between all these values and whatthey 
are used for, in order to be abie to give reasoning on what to use and when. 

Standard Deviation informs us about the distribution of individuai data points around the 
mean vaiue. If we assume normai distribution, then we know that 68.2% (~2/3) of data 
vaiues wiii faii between +SD, and 95.4% of vaiues wiii be between ±2*SD. 

Standard Error is caicuiated as SD divided by the square root of N (SD/V/V), where N is the 
number of data points. Standard Error (SE) informs us about variabiiity of mean vaiues, if 
we are abie to perform the same sampiing more than once (iike performing the same study 
hundreds of times). 

The confidence intervai is caicuiated from SE, simiiarto howthe range of vaiues is caicuiated 
from Standard Deviation. To caicuiate 95% confidence intervai, we must add/subtract 1.96 * 
SE to/from our mean vaiue or use proper notation: 95% Cl = M ± (1.96 * SE). The wider the 
confidence intervai, the iesser we wouid be sure that we are right. 

We see that in order to be sure that our estimation is correct and that we are giving its proof 
to our reader, we shouid dispiay the confidence intervai, which in turn carries the Standard 
error; this, if smaii, proves that our means are stabie. 


Making use of text and font properties 


You aiready iearned how to annotate the piot by adding iegends, but sometimes we want 
more with text. This recipe wiii expiain and demonstrate more features of text manipuiation in 
matpiotiib, giving a powerfui tooikitfor even advanced typesetting needs. 

We wiii not cover LaTeX support in this recipe, as there is a recipe named Rendering text with 
LaTeX in this chapter. 


Getting ready 


We start with iisting of the most usefui set of functions that matpiotiib offers. Most of the 
functions are avaiiabie via pyplot moduie's interface, but we map their origin function here 
to aiiow you to expiore more if a particuiar text feature is not covered in this recipe. 

Basic text manipuiations and their mapping in matpiotiib 00 API is presented in the 
foiiowingtabie: 
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matplotlib. 

pyplot 

Matplotlib API 

Descriptiori 

text 

matplotlib.axes.Axes.text 

Adds text to the axes at the 
Iocation specified by (x, 
y). Argument f ontdict 
allows us to override generic 
font properties, or we can 
use kwargs to override a 
specific property. 

xlabel 

matplotlib.axes-Axes.set xlabel 

Sets the label for the x 
axis. Specifies the spacing 
between the label and the 

X axis in accordance with 
labelpad. 

ylabel 

matplotlib.axes-Axes.set ylabel 

Similarto xlabel, but 
intended for the y axis. 

title 

matplotlib.axes-Axes.set title 

Sets the title for the axes. 
Accepts ali the usual 
text properties such as 
f ontdict and kwargs. 

suptitle 

matplotlib.figure.Figure. 
suptitle 

Adds a centered title to the 
figure. Accepts ali the usual 
text properties via kwargs. 
Uses figure coordinates. 

figtext 

matplotlib.figure.Figure.text 

Puts text anywhere on 
the figure. The Iocation 
is defined using (x,y), 
usingfigure's normalized 
coordinates. Override 
font properties using 
f ontdict, but also 
support kwargs to override 
any text-related property. 


The base class for text storing and drawing inside Windows or data coordinates is the 
matplotlib. text -Text ciass. It supports the definition of the iocation of text objects as 
weii as a range of properties that we can define, to tune how our strings are going to appear 
on a figure or a window. 


{2^ 
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The font properties supported by the matplotlib. text. Text instances are: 


Property 

Values 

Descriptiori 

family 

' serif', 

'sans-serif', 

'cursive', 

'fantasy', 

'monospace' 

Specifies the font name or font family. If this is 
a list, then it is ordered by priority, so the first 
matched name wiii be used. 

size or fontsize 

12, 10,... or 
'xx-small', 

'x-small', 

'small', 
'medium', 

'large', 

'x-large' , 

'XX-large ' 

Specifies the size in reiative or absoiute points or 
specifies the reiative size as a size string. 

style or 
fontstyle 

'normal', 

'italic', 

'oblique' 

Specifies the font styie as a string. 

variant 

'normal', 

'small-caps' 

Specifies the font variant. 

weight or 
fontweight 

0-1000 or 
'ultralight' , 

' light', 

'normal', 

'regular', 

' book' , 

'medium', 

'roman', 

'semibold', 

'demibold', 

' demi ' , 

' bold' , 

'heavy', 

'extra bold', 

' black' 

Specifies the font weight or using a specific weight 
string. Font weight is defined as the thickness of 
character outiine reiative to its height. 
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Property 

Values 

Descriptiori 

stretch or 

fontstretch 

0-1000 or 

' ultra- 
condensed ' , 

' extra- 
condensed', 

'condensed', 

' semi- 
condensed', 

' normal ' , 

' semi- 
expanded' , 

' expanded ' , 

' extra- 
expanded', 

' ultra- 
expanded ' 

Specifies the stretch of the font. Stretch is defined 
as horizontal condensation orexpansion. This 
property is not currently implemented. 

fontproperties 


Defaults to the matplotlib. font manager. 
FontProperties instance. This class Stores 
and manages font properties as described in 
WSCCSSLevell specification at http : //www. 
w3.org/TR/1998/REC-CSS2-19980512/. 


We can also specify the background box that will contain the text, and which can be further 
specified in color, borders, and transparency. 

The basic text color is read from rcParams [' text. color ' ], if not specified on the current 
instance, of course. 

Specified text can also be aligned accordingto visual needs. There are the following alignment 
properties: 

► horizontalalignment or ha: This allows alignment of text horizontally to center, 
lef t, and right. 

► verticalalignment or va: The allowed values for this are center, top, bottom, 
and baseline. 

► multialignment: This allows alignment of text strings that span multilines. The 
allowed values are left, right, and center. 
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How to do it... 


So far all is good, but we have a hard time visuaiizing aii these variatioris in the fonts we can 
create. So, this is going to iiiustrate what we can do. In the next code, we wiii perform the 
foiiowingsteps: 

1. List aii the possibie properties we want to vary on the font. 

2. Iterate over the first set of variations: font family and size. 

3. Iterate over the second set of variations: weight and styie. 

4. Render text sampies for both the iterations and print the variation combination as 
a text on the piot. 

5. Remove axes from the figure, as they serve no purpose. 

The foiiowing is the code: 

importmatplotlib.pyplot as plt 

frommatplotlib.font_manager import FontProperties 
# properties: 

families = ['serif', 'sans-serif', 'cursive', 'fantasy', 'monospace'] 

sizes = ['xx-small', 'x-small', 'small', 'medium', 'large', 
'x-large', 'xx-large'] 
styles = ['normal', 'italic', 'oblique'] 

weights = ['light', 'normal', 'medium', 'semibold', 'bold', 'heavy', 

'black'] 

variants = ['normal', 'small-caps'] 

fig = plt.figure(figsize=(9,17)) 
ax = fig.add_subplot(111) 
ax.set_xlim(0,9) 
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# VAR: FAMILY, SIZE 
y = 0 

size = sizes [0] 
style = styles[0] 
weight = weights[0] 
variant = variants[0] 

forfamily in families; 

X = 0 

y = y + .5 
for size in sizes; 

y = y + .4 

sample = family + " " + size 

ax.text(x, y, sample, family=family, size=size, 
style=style, weight=weight, variant=variant) 

# VAR; STYLE, WEIGHT 
y = 0 

family = families[0] 
size = sizes [4] 
variant = variants[0] 

for weight in weights; 

X = 5 

y = y + .5 
for style in styles; 
y = y + .4 

sample = weight + " " + style 

ax.text(x, y, sample, family=family, size=size, 
style=style, weight=weight, variant=variant) 

ax.set_axis_off() 
plt.Show() 
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The preceding code will produce the following screenshot: 


monospace xx-large 

monospace x-large 

monospace large 

monospace medium 

monospace small 

monospace x-small 

■onospace xx-stnaU 


fantasy xx-lcirge 
fantasy x-large 
fantasy large 


fantasy medium 

black oblique 

fautasy small 

black italic 

fantasy x-small 

black nomial 

Euiluy xx-sit»U 

cursive xx-l2irge 

fieavy oblique 
heavy italic 

cursive x-large 

heavy normal 

cursive large 

cursive medium 

cursive small 

cnrsire x-small 

sans-senf xx-large 
sans-senf x-large 
sans-senf large 

bold oblique 

bold italic 

bold iionnal 

semibold oblique 

semibold italic 

semibold normal 

sans-senf medium 

sans-senf small 

medium oblique 

sans-senf x-small 

sans-senf xx-small 

medium italic 

medium normal 

serif xx-large 

normal oblique 

serif x-large 

normal italic 

serif large 

normal normal 

serif medium 

serif small 

light oblique 

serif x-small 

light italic 

serif xx-small 

light normal 
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How it Works... 


The code is really straightforward, as we just iterate twice over tupies of properties printing 
their vaiues. 

The oniy trick empioyed here is the positioning of text on the figure canvas, as that aiiows us 
to have a nice iayout of text sampies we can easiiy compare. 

Keep in mind that the defauit font matpiotiib wiii use is dependent on the operating system 
you are running, so the preceding screenshot might iook siightiy different. This screenshot 
was rendered using Standard Ubuntu 13.04 instaiied fonts. 


Rendering text with LaTeX 


If we want to piot more scientific graphics and expiain math as it shouid be using scientific 
notations and compiex equations on the figures, we need supportfrom the best. 

Aithough matpiotiib has support for math text rendering, the best support comes from the 
LaTeX community, proven in the task being used for many decades. 

LaTeX is a high-quaiity typesetting system for the production of scientific and technicai 
documentation, being a de facto Standard for scientific typesetting or pubiication. It is a 
free Software, avaiiabie on majority of desktop piatforms used today as prepackages binary 
instaiiation; hence, it is easy to instaii. 

The basic syntax of LaTeX is simiiar to markup ianguages; so to produce satisfactory content, 
one wouid write focusing more on the structure than on the iook and styie. For exampie: 

\documentclass {article} 

\title{This here is a title of my document} 

\author {Peter J. S. Smith} 

\date {September 2013} 

\begin {document} 

\maketitle 

Helio World, from LaTeX! 

\end{document} 

We see how this is different from the usuai word processor, where the WYSIWYG editor 
environment and the styie is aiready appiied to your text. Sometimes this is good but, for 
scientific pubiications, styie is a secondary concern; the primary focus is having the right, 
correct, and vaiid content. Here, by content, we aiso mean mathematicai notations (usuaiiy a 
iot of it), inciuding graphs. 

Apart from this, there are many more features such as automatic generation of bibiiographies 
and indexes, which are important for medium to iarge pubiications. These are the main focus 
points of the LaTeX system. 
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Since this is not a book about LaTeX, we will stop with the quick introduction here. A lot more 
documentation is avaiiabie on the projecfs website at http: //latex-proj ect. org/. 


Getting ready 


Before we start demonstrating matpiotiib's supportfor renderingtext using LaTeX, we need to 
have the foiiowing packages instaiied on our system: 

► LaTeX system: The most common one is the TeX Live prepackaged distribution 

► DVI to PNG converter: This makes PNG graphics from DVI fiies as obtained from TeX, 
by producinganti-aiiased screen-resoiution images 

► Ghost script: This is required, uniess aiready instaiied by TeX Live distribution 

There are different prepackaged systems of the LaTeX environment for different operating 
Systems. For Linux-based systems, TeX Live is a compiete TeX system. For Mac OS, the 
recommended environment is the MacTeX distribution; for the Windows environment, the 
proTeX system is goingto instaii aii the TeX supports, inciuding LaTeX. 

Whichever package you instaii, make sure it comes with font iibraries and programs for 
typesetting, previewing, and printing of TeX documents in many different ianguages. 

We wiii instaii our package for Linux using the texlive and dvipng packages for Ubuntu. 
We can instaii this using the foiiowing command: 

$ sudo apt-get instaii texlivedvipng 

The next step is to teii our matpiotiib to use LaTeX by setting text. usetex to True. We can 
do that either in our custom .matplotlibrc inside our horne directory (/home/<user>/ . 
matplotlibrc on Unix-based systems, or C: \Documents and Settings\<user>\ . 
matplotlibrc) via rcParams [' text ' ], or using the foiiowing code: 

matplotlib.pyplot.rc('text', usetex=True) 

The start of the code wiii teii matpiotiib to go back to LaTeX for aii text rendering. It is 
important to do this before we add any figure and axis. 

Not aii backends support LaTeX rendering. Oniy the Agg, PS, and PDF backends support text 
rendering via LaTeX. 


How to do it... 


What we want to do here is demonstrate the basic usage properties of LaTeX. We wiii perform 
the foiiowing steps: 

1. Generate some sampie data. 

2. Set up matpiotiib to use LaTeX for this piotting session. 
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3. Set up the font and font properties to be used. 

4. Write out the equation syntax. 

5. Demonstrate the usage of Greek symbois' syntax. 

6. Draw math notations of fractions and fractals. 

7. Write some limits and exponentiai expressions. 

8. Write possibie range expressions. 

9. Write expressions with text and formatted text in them. 

10. Write some math expressions on x and y iabeis as figure tities. 

The foiiowing code wiii perform these steps: 

import numpy as np 

import matplotlib.pyplot as plt 


# Example data 

t = np.arange(0.0, 1.0 + 0.01, 0.01) 
s = np.cos(4 * np.pi * t) * np.sin(np.pi*t/4) + 2 


plt.rc('text', usetex=True) 

plt.rc('font',**{'family';'sans-serif','sans-serif':['Helvetica'], 

'size ' : 16}) 

plt.plot(t, s, alpha=0.25) 

# first, the equation for 's' 

# note the usage of Python's raw strings 

plt.annotate(r'$\cos(4 \times \pi \times {t}) \times \sin(\pi \times \ 
frac {t} 4) + 2$', xy=(.9,2.2), xytext=(.5, 2.6), color='red', arrowpr 
ops={'arrowstyle';'->'}) 


# some math alphabet 

plt.text(.01, 2.7, r'$\alpha, \beta, \gamma, \Gamma, \pi, \Pi, \phi, \ 
varphi, \Phi$') 

# some equation 

plt.text(.01, 2.5, r'some equations $\frac{n!}{k!(n-k)!} = {n \choose 
k}$') 


# more equations 

plt.text(.01, 2.3, r'EQl $\lim_{x \to \infty} \exp(-x) = 0$') 

# some ranges... 

plt.text(.01, 2.1, r'Ranges; $(a), [b],\{c\}, |d|,\|e\|,\ 

langle f \rangle, \lfloor g \rfloor, \lceil h \rceil$') 

# you can multiply apples and oranges 


imi- 
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plt.text(.01, 1.9, r'Text: $50 apples \times 100 oranges = lots of 
juice$') 

plt.text(.01, 1.7, r'More text formatting: $50 \textrm{ apples} \times 
100 \textbf{ apples) = \textit{lots of juice}$') 

plt.text(.01, 1.5, r'Some indexing: $\beta = ( \beta_l , \beta_2 , \dotsc ,\ 
beta_n)$') 

# we can also write on labeis 

plt.xlabel(r' \textbf {time} (s)') 

plt.ylabel(r' \textit {y values) (W)') 

# and write tities using LaTeX 
plt.title(r"\TeX\ is Number " 

r"$\displaystyle\sum_{n=l}^\infty\frac{-e^{i\pi}}{2^n}$!", 
fontsize=16, color='gray') 

# Make room for the ridiculously large title. 
plt.subplots_adjust(top=0.8) 

plt.savefig('tex_demo') 
plt.Show() 

The preceding code will render the following text-saturated figure that demonstrates 
LaTeX rendering: 



TeX is Number ^ 

2.8 

,._1 

Q.,3.7.r.7r, n.0. V?. «T 



2.6 

cos(4 X TT X f) X siu(7r x -j) + 2 



some equations = (}!) \ 


2.4 




EQ1 exp(— i) =0 \ 


^2.2 



to 

0) 

Ranges: (a). [6]. {c}. |rf|. ||e||. (/). [gj. [/0 


:3 


§ 2.0 



is 

Text: bOapples x lOOoronges = lotsofjuice 


1.8 




More text formatting: .50 apples x 100 apples = lots ofjuice 


1.6 




Some indexing: P = {di.i32- ■ Pn) 


0 0.2 0.4 0.6 0.8 1.0 


time (s) 
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How it Works... 


After we set up the rendering engine and font properties, we basically used Standard 
matplotiib calls for text rendering, such as matplotlib .pyplot. annotate, matplotlib. 
pyplot.text, matplotlib.pyplot.xlabel, matplotlib.pyplot.ylabel, and 
matplotlib.pyplot.title. 

The difference here is that aii the strings are so-caiied raw strings, meaning that Python wiii 
not interpretthem and no string substitution wiii occur; hence, the LaTeX engine is goingto 
receive exactiy the same strings as commands to act upon. 

More exampies of how to use TeX and how to integrate it in matpiotiib can be found on the 
officiai matpiotiib documentation at http: //matplotlib. org/users/mathtext. 
html#writing-mathematical-expressione. 

Note thatthis URL is not on LaTeX but on matpiotiib's own integrated TeX parser. This parser 
supports aimost the same syntax, and it can even be sufficient for your needs. 


There's more... 


If you run into a probiem whiie setting up this environment or have different probiems with 
fonts that either iook bad or are not abie to produce the LaTeX rendering, make sure that you 
have instaiied aii required packages, your $path environment variabie (if on Windows) is set 
up to inciude aii the required binaries, and matpiotiib is set to use LaTeX for text rendering. 

If aii of the given instructions are foiiowed and the resuits cannot be repiicated, refer 
to the officiai matpiotiib website at http://matplotlib.org/users/usetex. 
html#possible-hangups and the LaTeX community on http: //tex. stackexchange . 
com/ for further assistance. 

It is known that this setup is not as streamiined as it shouid be, and some quirks may occur 
for various reasons. 


Understanding the difference between 
pyplot and OO API 


This recipe wiii try to expiain some of the programming interfaces in matpiotiib and make 
a comparison of pypiotand object-oriented API (Application Programming Interface). 
Depending on the task at hand, this wiii aiiow us to decide why and when to use either of 
these interfaces. 
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Getting ready 


When the matplotlib library was introduced, it was similar to many open source projects—there 
was no proper (free) solution to the problem a person had, so he wrote one. The problem 
encountered with MATLAB® was with respect to performance for the task in hand (http: // 
www.aosabook.org/en/matplotlib.html), and the original author aiready had knowledge 
of both MATLAB® and Python, so he started writing matplotlib as a solution for his need for the 
current project. 

This is the main reason matplotlib has a MATLAB®-like interface that allows one to quickly 
plot data without worrying about background detaiis, such as which platform matplotlib is 
running on, what are the underlying rendering libraries (is it with GTK, Qt, Tk, or wxWidgets 
either on Linux or Windows), or are we running on Mac OS with the help of Cocoa toolkits. This 
is all hidden inside matplotlib under a nice procedural interface in the matplotlib.pyplot 
module, a statefui interface handiing logic for creating figures and axes to connect them with 
the configured backend. It also keeps data structures for the current figure and axes, which 
are called upon with the plot commands. 

This is the interface (matplotlib .pyplot) we have been usingthrough mostof this book 
as it is simple, straightforward, and good enough for most of the tasks we were trying to 
accomplish. The matplotlib library was designed with this philosophy in mind. We must be 
able to draw plots with as few commands as possible, even just one command (for example, 
plt .plot ( [ 1 , 2 ,3,4,5] ) ; plt. showO works). For these tasks, we don't wantto be 
forced into thinkingabout objects, instances, methods, properties, rendering backends, 
figures, canvases, lines, and other graphical primitives. 

If you are reading this book from the start, you probably note that some classes started 
appearing in various examples, such as, FontProperties or AxesGrid, where we needed 
more than what is provided by the matplotlib.pyplot module. 

This is the object-oriented programming interface that implements all the hidden hard stuff, 
such as rendering graphical elements, rendering those to the platform's graphical toolkit, and 
handiing user inputs (mouse and keystrokes). There is nothing to stop us from using 00 API, 
and that is what we are going to do. 

So if we take a look at matplotlib as Software, it consists of three parts: 

► matplotiib.pyplot interface: This is a set of functions for the user to create plots like 
in MATLAB® 

*■ matplotlib API (also called matplotlib frontend): This is a set of classes for the 
creation and management of figures, text, lines, plots, and so on 

► backends: These are drawing drivers. They transform front abstract representation 
into a file or a display device 
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This backend layer contains concrete implementations of abstract interface classes. There 
are classes, such as FigureCanvas (a surface to draw onto paper), Renderer (a paintbrush 
that does the drawlng on the canvas), and Event (a class that handies the user's keystrokes 
and mouse events). 

The code Is also separated. The base abstract classes are In matplotlib.backend_bases 
and every concrete Implementatlon Is In a separate module. For example, the GTK 3 backend 
Is In matplotlib. backends . back;end_gkt3agg. 

In thIs stack, there Is an Artist classes' hierarchy where most of the hard stuff Is done. 
Artist knows about Renderer and how to use It to draw Images on FigureCanvas. Most 
of the stuff, we are Interested In (text, lines, ticks, tick labeis, Images, and so on) are Artist 
or subclasses of the Artist class (located In the matplotlib. artist module). 

The matplotlib. artist .Artist class contains all the shared properties of Its chlidren: 
coordinates transformatlon, cllp box, label, user event handiers, and visibllity. 


text.Annotation 

text.TextWithDash /-^^ 

lines.Une2D 

patches.FancyArrow patches.Polygon 


patches.RegularPolygon 

patches.Ellipse i 

-patches.Patch 

patches.Rectangie >- 

patches.Arrow /' 

} artist.Artist matplotlib. 

spines.Spine J 

legend.Legend y 

figure.Figure J 

axis.XAxis / 

-\ axis.Axis J 

axis.YAxis - 

axis.XTick 

axis.Tick 

axis.YTick 

n - 


In this figure, Artist Is the base for most of the other classes. There are two basic categorles 
of classes that are Inherited from Artist. The first category Is of primitive artists that are visible 
objects such as Line2D, Rectangie, Circle, and Text. The second category Is of composite 
artists that are collections of other Artists, such as Axis, Tick, Axes, and Figure. For example, 
Figure has the background of the primitive artist Rectangie, but also contains at least one 
composite artist. Axes. 

Most of the plotting Is happening on the Axes class (matplotlib. axes .Axes). The figure 
background elements such as ticks, axis lines, and the grid and color of the background patch 
Is contalned In Axes. Another Important feature of Axes Is that all the helper methods create 
other primitive artists and add them to the Axes Instance; for example, plot, hist, and 
imshow. 


iHzl- 
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Axes -hist, for example, creates many matplotlib .patch. Rectangle instances and 
Stores them in the Axes . patches collection. 

Axes -plot creates one or more matplotlib. lines. Line2D and Stores them in the 
Axes. lines coiiection. 


How to do it... 


As an iiiustration, we wiii: 

1. Instantiate the matpiotiib Path object for custom drawing. 

2. Construet the vertices of our object. 

3. Construet the path's command codes to connect those vertices. 

4. Create a patch. 

5. Add it to the Axes instance of f igure. 

The foiiowing code impiements our intentions: 

import matplotlib.pyplot as plt 
from matplotlib.path import Path 
import matplotlib.patches as patches 

# add figure and axes 

fig = plt.figure0 

ax = fig.add_subplot(111) 

coords = [ 


1. 

, 0.) 

, # 

start position 

0 . 

, 1.) 

, 


0 . 

, 2.) 

, # 

left side 

1. 

, 3.) 

, 


2 . 

, 3.) 

, 


3 . 

, 2.) 

, # 

top right corner 

3 . 

,1.) , 

# 

right side 

2 . 

, 0.) 

, 


0 . 

, 0.) 

, # 

ignored 


] 


line_cmds = [Path.MOVETO, 
Path.LINETO, 

Path.LINETO, 

Path.LINETO, 

Path.LINETO, 

Path.LINETO, 


-1258} 
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Path.LINETO, 

Path.LINETO, 

Path.CLOSEPOLY, 

] 

# construet path 

path = Path(coords, line_cmds) 

# construet path pateh 

pateh = patehes.PathPatch(path, lw=l, 
facecolor='#A1D99B', edgecolor='#31A354') 

# add it to *ax* axes 
ax.add_patch(pateh) 

ax.text(l.l, 1.4, 'Python', fontsize=24) 
ax.set_xlim(-1, 4) 
ax.set_ylim(-1, 4) 
plt.Show() 

The preceding code will generate the following: 



0 1 2 3 4 


How it Works... 


For this octagon, we used the base pateh matplotlib.path. Path, which supports the 
basic set of primitives for drawing lines and curves (moveto and lineto). These can be used 
to draw simple and also more advanced polygons using Bezier curves. 
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First, we specified a set of coordinates in the data coordinates that we match with a set 
of path commands to act upon those coordinates (or vertices, if you iike). With that, we 
instantiate matplotlib .path. Path. We then construet the pateh instance matplotlib. 
patehed. PathPatch with that path, which is a generai poiycurve path pateh. 

This pateh can now be added to the figure's axes (the f ig.axes coiiection), and we can 
render the figure to show the poiygon. 

What we didn't wantto do in this exampie is use matplotlib. figure. Figure directiy in 
piace of the matplotlib. pyplot. figure () caii. The reason for this is that the pyplot. 
figure () caii does a iot in the background, such as readingthe rc parameters from the 
matplotlibrc fiie (to ioad defauit f igsize, dpi, and figure eoior settings), setting up the 
figure manager ciass (Gcf ), and so on. We couid do aii that, but untii we reaiiy know what we 
are doing, this is the recommended way to create the figure. 

As a generai ruie of thumb, uniess we cannot achieve something via the pyplot interface, 
we shouid not reach for direct ciasses such as Figure, Axes, and Axis, because there is a 
iot of state managing going on in the background; uniess we are deveioping matpiotiib, 
we shouid avoid bothering about it. 


There's more... 


If you want interactivity and expioration, it wouid be the best to use matpiotiib via the Python 
interactive sheii. For this purpose, probabiy the most weii known is IPython. This gives you aii 
the matpiotiib features in a powerfui and introspective sheii with rich set features such as 
history, iniine piotting, and the possibiiity to share your work if you use the IPython Notebook. 

The IPython Notebook is an interface to the IPython sheii that can be accessed through the 
browser. Matpiotiib has a strong integration with this interface, indeed the piots can be directiy 
embedded in the browser interface. 
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Visualizations on the 
Clouds with Plot.ly 


In this chapter, you will cover the following recipes: 

► Creating line charts 

► Creating bar charts 

► Plotting a 3D trefoii knot 

► Visualizing maps and bubbles 


Introduction 


Plot.ly is an oniine data visualization tool. It makes it possible for us to create and share 
Interactive charts. Plot.ly can be used in two ways: you can either login into the website, 
upioad your data, and use the web interface to create a chart, or use their API. In this 
chapter, we will focus on how to use their API. 

The difference with matplotiib is that our charts are now created onIine and not on our 
machine. This means that we will be able to access them from the Plotly website. With a 
free account, ali the charts that you make are made public and anyone can access them. 

To try the recipes that will be presented in this chapter, you need a Plot.ly account and an API 
key. You can create a Plot.ly account by going on https: //plot. ly/ and clicking on SIgn 
In. After the account is created, you can go in the Settings section and generate an API key. 

After this, the API binding for Python can be installed using pip, as we saw in the first chapter: 

$ pip install plotly 

Now we're ready to go! 
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Creating line charts 


In this recipe, we will see how to create a line chart. We have aiready introduced this kind 
of chart in Chapter 3, Drawing Your First Plots and Customizing Them, and we have seen 
how to make the plots with matplotiib. This time well focus on how to create and share 
them with Plot.ly. 


Getting ready 


Before starting, you need to set up your credentiais for the Plot.ly platform in the 
programming environment: 

$ python -c "import plotiy; plotiy.tools.set_credentials_ 
file(username='DemoAccount', api_key='mykey‘)" 

Replace Demo Account and mykey with your Plotiy username and API key. 


How to do it... 


The following code example demonstrates how to plot two curves. In particular, we will: 

1. Generate the data to plot (a sine and a cosine wave). 

2. Organize the data in the format required by Plotiy. 

3. Send a request to the server. 

4. Receive a URL that points to our chart 

5. Run the following code: 

import plotiy.plotiy as py 

from plotiy.graph_objs import Scatter 

import numpy as np 

X = np.linspace(-2*np.pi, 2*np.pi, 50) 

traceO = Scatter(x=x.tolist(), 
y=np.sin(x) .tolist() , 
name='sin(x)' 

) 

tracel = Scatter( 
x=x.tolist(), 
y=np.cos(x).tolist(), 
name='cos(x)' 

) 
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data = [traceO, tracel] 

unique_url = py.plot(data, filename = 'sin-cos') 


How it Works... 


Here, we used the Scatter object twice. The first instance of Scatter (traceO) represents 
the points of the sine curve, while the second instance (tracei) represents the cosine curve. 
The parameters x and y of the constructor of these objects are used to specify the data 
points to piot, whiie the parameter name is used for the iegend of the chart. 

After buiiding the objects that represent the curves, we wrapped aii the data in a iist that was 
then passed to the py. plot method. This method invokes Piot.iy and creates the chart for 
us. Right after invokingthis method, your defauit browser opens automaticaiiy on a page that 
shows the chart. Here's what I got: 



Now, the chart (and aiso the data) is stored on the Piot.iy servers, and we can access it 
through the URL returned by the method piot. Every chart has a unique name specified 
through the parameter fiiename of the method piot (in this case, the name of the chart 
is sin-cos). 


j263| — 
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The peculiarity of the Plotly Interface Is that the charts are Interactive. If we hover the mouse 
pointer on the curves, we see a tooltip box that shows the Information of the point we're on. 
We can also zoom out and zoom In using the scrolllng wheei of the mouse. 


There's more... 


Followlngthe llnk DATA, we also have the opportunity to Inspect the data: 


C fi https://plot.ly/~JustGlowing/326/sinx-vs-cosx/ 


4 


sin-cos 


a = 


,»"! Made byJustGlowing Lastedited22minutesago 


Public Edit graph View full-size graph 


PLOT 


DATA 


CODE 

EXTRAS 



Thls feature enables us to share the chart and the data at the same time. If we go into the 
CODE section, we can also view the data in the JSON format: 
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We even get a script (in Python, MATLAB, JavaScript, R, or Juiia) that recreatos the chart. 
Here's the Python version: 


C https://plot.ly/~JustGlowing/326/sinx-vs-cosx/ 


☆ 1 ^ S 


,■"! Made byJustGlowing Lastedited 2 minutes ago 


d Public Editgraph Viewfull-sizegraph 


PLOT 

DATA 

CODE 

EXTRAS 


# Find your api_key here: https://plot.ly/settinga/api 


Language: python 


See as plain text 


iitkport plotly.plotly as py 
from plotly.graph_objs import * 
py.sign_in( 'username' , 'api_key' ) 
tracel = Scatter( 

x=[-6. 283185307179586, -6.026728764029399, -5.770272220879212, -5.513815677729025, -5.2573591345788: 
y=[2.4492935982947064e-16, 0.25365458390950746, 0.49071755200393785, 0.6956825506034863, 0.85514276: 
naine= ' sin(x) ' , 

xsrc= 'JustGlowing:327:a86cf0' , 
ysrc= 'JustGlowing:327:aebbac' 

) 

trace2 = Scatter( 

x=[-6. 283185307179586, -6.026728764029399, -5.770272220879212, -5.513815677729025, -5.2573591345788: 
y°[1.0, 0.9672948630390295, 0.8713187041233894, 0.7183493500977277, 0.5183925683105253, 0.2845275861 
naine= ' cos (X) ’ , 

xsrc= 'JustGlowing:327:a86cf0' , 
ysro= 'JustGlowing:327:ff5839' 

) 


★ 


i # S" </> 
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Also, going into the Edit graph section, we are able to modify many aspects of the chart, such 
as the theme and layout, and to add notes and so on without going back to the code: 



This very handy because it allows us to sketch a chart with a very simple code snippet and 
then improve its appearance with a point-and-click interface. 


Creating bar charts 


In this recipe, we will focus on how to create a bar chart to compare the occurrences of different 
crimes in Germany, Italy, and Spain in the year 2012. In particular, we will create a bar chart 
where we have three bars for each country, one with the number of burglaries, another with the 
number of robberies, and a third with the number of motor vehicle thefts. 


Getting ready 


For this recipe, we need the crim_gen. tsv file which comes with this book. This file contains 
the number of crimes reported to the police by year and by country. This data has been 
downloaded from the Eurostat website (http: //ec. europa. eu/eurostat). 

We assume that this file is in the same directory as the code using it. 
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How to do it... 


The following code example demonstrates how to create a bar chart. We will: 

1. Open a tsv (tab separated values) file. 

2. Isolate and organize the data that we want to plot. 

3. Invoke plotlyto make the chart. 

# bar charts 

import pandas as pd 

crimes = pd.read_csv('crim_gen.tsv', sep=',|\t', na_values=': ') 

crimes = crimes[crimes.country.isin(['IT', 'ES', 'DE'])] 

burglary = crimes.query('iccs == "burglary"')[['country', '2012 
']].sort(columns='country').values 

robbery = crimes.query('iccs == "robbery"')[['country', '2012 ']]. 
sort(columns='country').values 

motor_theft = crimes.query('iccs == "theft_motor_vehicle"') 
[['country', '2012 ']].sort(columns='country').values 

import plotly.plotly as py 
from plotly.graph_objs import * 

tracel = Bar( 

x=burglary[;,0].tolist(), 
y=burglary[;,1] .tolist() , 
name='burglary' 

) 


trace2 = Bar( 

x=motor_theft[:,0].tolist(), 
y=motor_theft[:,1] .tolist(), 
name='motor theft' 


traceS = Bar( 

x=robbery[:,0] .tolist() , 
y=robbery[: , 1] .tolist(), 
name='robbery' 


data = Data( [tracel, trace2, traceS] ) 
layout = Layout( 
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barmode='group' 


fig = Figure(data=data, layout=layout) 
plot_url = py.plot(fig, filename='bars-critnes') 



In this recipe, we have used pandas (which were introduced in the first chapters of this 
book) to import and query the data. First, we isoiated the data by the countries that we were 
interested in using the isin method, and then by the types of crimes that we were interested 
in. In particular, we have the three matrices burglary, robbery, and motor_thef t, where 
the first coiumn is the country code and the second is the number of times that crime has 
been reported in the country. Here's whatthe matrix motor_theft iooks iike: 

[ ['DE' , 70511.0] , 

['ES', 55197.0], 

['IT', 196589.0]] 

For each of the matrices, we instantiated a bar object, just iike we did for the scatter object, 
but this time the parameter x is the first coiumn of the matrix and y is the second. The data 
was again organized in a iist and passed to the method piot. The resuit shouid be as foiiows: 


250k 


H burglary 
H motor_theft 
H robbery 


200k 


lOOk 


150k 


50k 


0 



DE 


ES 


IT 


As we can see, we have three groups of bars, and each group contains three bars. 
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There's more... 


In this snippet, we also used a new object: layout. This object enables us to specify the layout 
properties of the chart. Settingthe parameter of this object bar mode as group, we specified 
that the bars needed to be grouped. If we set this attribute to stack, we get something iike 
this: 



This means that now we aiso know how to stack the bars instead of Just groupingthem. 


Plotting a 3D trefoii knot 


In this recipe, we wiii see how to piot a 3D trefoii knot. A trefoii knot is a ciosed curve with 
three crossings. In this recipe, we wiii draw not Just a curve, but a soiid 3D curve. This is 
beyond the piotiy capabiiities, and we wiii impiement this functionaiity using a trick. 


How to do it... 


In this recipe, we wiii: 

1. Generate aii the points of the knot using parametrio equations. 

2. Organize the data as required by plotly. 

3. Define a ciean iayout. 
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4. Invoke plotiy to draw the chart: 

import numpy as np 
import plotiy.plotiy as py 

from plotiy.graph_objs import ScatterSd, Data, Layout 
from plotiy.graph_objs import Figure, Line, Margin, Marker 

from numpy import linspace,pi,cos,sin 
phi = linspace(0,2*pi,250) 

X = sin(phi)+2*sin(2*phi) 
y = cos(phi)-2*cos(2*phi) 
z = -sin(3*phi) 

traces = list () 

colors = ['rgb(%d,50,210)' % c for c in np.abs(z / max(z)) * 255] 
for i in linspace(-np.pi,np.pi,50): 

trace = ScatterSd(x=x+np.cos(i)*.5, y=y+np.sin(i)*.5, z=z, 
mode='markers', 

marker=Marker(color=colors, size=13)) 
traces.append(trace) 

data = Data(traces) 

layout = Layout(showlegend=False, autosize = False, 
width=500, height=500, 
margin=Margin(l=0,r=0,b=0,t=65)) 


fig = Figure(data=data, layout=layout) 
plot_url = py.plot(fig, filename='3d-trifoil') 


How it Works... 


First, we generated all the points of our knot curve using the following parametric equations: 

X = sin(phi)+2*sin(2*phi) 
y = cos(phi)-2*cos(2*phi) 
z = -sin(3*phi) 


This means thatx, y, and z are parallel vectors, and the point (x [i], y [i], z [i] ) is a point 
of our curve in the 3D space. To generate a soiid curve, we created a series of other curves 
around the main one. Indeed, we generated a set of Scatter3d objects (each one is a curve). 

To give to the knot a 3D effect, we draw each point of each curve with a different eoior. The 
eoior is the function of the z coordinate, and it is biue when z is equai to o and graduaiiy 
becomes purpie when z moves away from o. 
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The colors were specified with a list of RGB triplets. Indeed, if we take a look at the values of 
the list colors, it will look like this: 

['rgb(0,50,210)', 

'rgb(19,50,210)', 

'rgb(38,50,210)', 

'rgb(57,50,210)', 

'rgb(76,50,210)', 


Each element is a string that contains the RGB value of one of the points of the curve. 
The results are as follows: 



Here, we can not oniy zoom in and out, but also rotate the figure. 
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Visualizing maps and bubbles 


In this recipe, we will see how to visualize a map and place a bubble on each country, in this 
case some European countries. The size of each bubble will be proportional to the number of 
total reported crimes in that country. 


Getting ready 


Here we will again use the crim_gen. tsv file, which comes with this book, assuming that 
this file is in the same directory as the code using it. 


How to do it... 


For the following recipe, we will proceed as follows: 

1. Import and query the data. 

2. Define the coordinates of each country. 

3. Create an entry for each country. 

4. Define the layout for the chart. 

5. Invoke plotiy. 

import plotiy.plotiy as py 
from plotiy.graph_objs import * 


import pandas as pd 

crimes = pd.read_csv('crim_gen.tsv', sep=',|\t', na_values=': ') 

crimes = crimes[crimes.country.isin(['IT','ES','DE','FR','NO',' 
FI'] )] 

total_crimes = crimes.query('iccs == "TOTAL"')[['country', '2012 
']].sort(columns='2012 ').values 

coords = {'IT': (13.007813, 42.553080), 'ES': (-3.867188, 

39.909736), 'DE': (9.316406,50.736455), 

'FR': (2.636719, 46.195042), 'NO': (8.613281, 

61.100789), 'FI': (25.839844, 62.431074)} 

scale = 300000 
countries = [] 


for info in total_crimes: 
c = coords[info[0]] 
country = dict( 
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type = 'scattergeo' , 

Ion = [c [0] ] , 
lat = [c [1] ] , 

text = info[0] + ' : '+str(info[1] ), 
sizemode = 'diameter', 
name= info [0] , 
marker = dict( 

size = info[l] / scale, 
color = 'red', 

line = dict(width = 1,color = 'red') 

) ) 

countries.append(country) 
layout = dict( 

title = '2012 Reported crimes', 
showlegend = True, 
geo = dict( 

scope='europe' 

) , 

) 


fig = dict( data=countries, layout=layout ) 

uri = py.plot( fig, validate=False, filename='bubble-map-crimes' ) 


How it Works... 


Here, we have isolated the data for six countries: Spain, Italy, Germany, France, Norway, and 
Finiand. For each of these countries, we defined the coordinate to piace the bubbie in the 
dictionary coords. Then, for each country, we created a dictionary with the detaiis of the 
bubbie to show the size, string in the tooitip, eoior, and geographicai coordinates. 

Then, we created the iayout for the chart. What teiis Piotiy that this chart contains a map 
is the parameter geo. When Piotiy finds this parameter in the specifications of the iayout 
it automaticaiiy assumes that it is a map. With this parameter, we specify the scope of the 
map, which in this case is Europe. 
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The resultingfigure shouid be as follows: 
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